Diagnostics (for technical support)
Usage
As command line options for genozip (Z), genounzip (U), genocat (C), genols (L)
Note: When used with genocat most options show only the requested metadata and not the file data itself.
Memory consumption
--show-memory[=PEAK]
ZUCL. Show which Buffers are consuming the most memory. Normally, memory is sampled at the end of comprssion or decompression. With =PEAK, each Buffer retains its maximum allocation throughout execution.
​
kill -USR1 pid
ZUCL. Executes --show-memory on a running process. Not available on Windows.
​
--debug-memory[=bytes]
ZUCL. Show Buffer allocations and destructions. If <bytes> is specified then show only allocations of at least <bytes>.
​
--show-hash
Z. See raw numbers that feed into determining the size of the global hash tables.
​
genozip file contents
-W --STATS
ZUC. Show more detailed statistics.
​
Note: specifying -W or -w twice, results in the header line of the statistics printed to stderr, thereby surviving piping stdout to grep.
--show-data-type
C. Show the data type of a genozip file.
​
--show-alleles
ZUC. VCF: Output allele values to stdout. Each row corresponds to a row in the VCF file. Mixed-ploidy regions are padded and 2-digit allele values are replaced by an ascii character.
​
--show-dict[=field]
ZUC. Show dictionaries read/written for each vblock. With optional field (use --STATS to see the field names in the file) shows only that one field.
​
--show-singletons=field
ZUC. Show singletons in local.
​
--show-counts=field
ZUC. Show (per snip in dictionary) the number of words in the file using this snip. genozip - works for any context (use --STATS to see context names). genounzip/genocat - works only for contexts that have a SEC_COUNTS section (which include
any contexts in a file generated with genozip --show-counts of that context).
​
--show-b250[=field]
ZUC. Show b250 sections content - each value shows the line (counting from 1) and the index into its dictionary (note: REF
and ALT are compressed together as they are correlated). With optional field (eg CHROM ; RNAME ; POS ; AN etc) shows only
that one field. This also works with genounzip and genocat but without the line numbers.
​
--dump-b250=field
ZUC. Dump the binary content of the b250 data of this field exactly as they appear in the genozip format to a file named
"field.b250" - specify the field name as it appears in the Name column in --STATS for fields that have "comp b250" data.
​
--dump-local=field
ZUC. Same as --dump-b250 just for the local buffer.
​
--contigs
ZUC. List the names of the chromosomes (or contigs) included in the file. Alternative names: --chroms --list-chroms
​
--dump-section section-type or section_i
ZUC. Dump the uncompressed unencrypted contents of all sections of this type as it appears in
--show-gheaders (eg SEC_REFERENCE) or a single section by its number as it appears in the first column of the --show-gheaders output - to a pair of files named "section-type.vb.dict_id.[header|body]".
--show-headers[=section-type| field | section_i]
ZUC. Show all the sections headers or those of a specific section type or field name if the optional argument is provided, or a single section by its number as it appears in the first column of the --show-gheaders output. Argument is a case-insesitive substring of a section name or a case-sensitive field name. genozip and genounzip show the headers encountered in their normal operation, while genocat shows all the headers in the file, in the order they are in the file.
In combination with --force, magic-based scanning for headers is conducted in the file without relying on the section list - useful for truncated or corrupted genozip files, and also for private files (compressed with --sendto).
​
--recover
UC. In case of GENOZIP_HEADER section does not appear in the offset specified in the footer due to corruption - scan for the header. Used in combination with --show-headers --force or --show-gheaders --force.
--show-index
ZUC. Show the content of the random access index (SEC_RANDOM_ACCESS section).
​
--show-lines
Z. Show the byte offset of each line
​
--show-reference
ZUC. Show the ranges included the SEC_REFERENCE sections
​
--show-ranges
UC. Show the ranges as in RefStruct.ranges
​
--show-ref-seq
ZUC. Show the reference sequences as stored internally in a SAM BAM or FASTQ file (also works for a reference file but
--reference --regions is faster). Combine with --regions to see specific regions (genocat only). Combine with
--sequential to omit newlines. '-' appears in unset loci.
​
--show-ref-index
ZUC. Show the content of the random access index of the reference data (SEC_REF_RAND_ACC section).
​
--show-ref-hash
ZUC. Show the details of the reference hash table (SEC_REF_HASH) sections.
​
--show-chrom2ref
ZUC. Show the details of the file contigs that are mapped to a different contig name in the reference (eg '22' âž” 'chr22').
​
--show-ref-contigs
ZUC. Show the details of the reference contigs.
​
--show-ref-iupacs
ZC. Show the the IUPACs in the reference data.
​​
--show-txt-contigs
ZUC. SAM/BAM/CRAM: Show the details of the contigs appearing the file header (SQ lines).
​​
--show-gheader
ZUC. Show the content of the genozip header (which also includes the list of all sections in the file).
​​
--show-vblocks[=task]
ZUC. Show vblock information as they are read / written. Optional task limits output to a specific dispatcher task, e.g. piz.
​​
--show-aliases
ZUC. See contents of SEC_DICT_ID_ALIASES section.
​​
--show-reference
ZUC. Show the ranges included the SEC_REFERENCE sections.
​
--show-is-set contig
UC. Shows the contents of SEC_REF_IS_SET section for contig.
​
--show-gz
Z. Show details about the GZ / BGZF compression of a file.
​
--show-bgzf
ZUC. Show BGZF blocks as they are being compressed or decompressed.
​
--show-sag[=grp_i]
ZUC. SAM/BAM/CRAM: Show SA groups (supplementary / secondary alignments + their primary alignment).
​​
--show-depn
Z. SAM/BAM/CRAM: Show supplementary / secondary alignments that are successfully mapped against a primary alignment.
​
--show-sec-gencomp​
UC. SAM/BAM/CRAM: Show contents of SEC_GENCOMP : num_prim_lines and num_depn_lines for each MAIN VB. Note that VBs are shown in order of absorption (i.e. aligned to the order of the prim/depn lines)
​
Text file contents
--show-bam
C. Show alignments of a BAM file.
--analyze-insertions
C. SAM/BAM: show statistics regarding inserted bases.
​
Subsetting a file for debugging
--biopsy=vb[,vb...] or [MAIN]|[PRIM]|[DEPN}
Z. Dump a subset VBs of the source file being compressed and including the txt header. The argument is a comma separated list of VB numbers or VB ranges. An argument of 0 means txt header only.
For SAM/BAM only: a comma-seperated combination of MAIN, PRIM and/or DEPN may be specified. This is useful as in gencomp VB numbers might change between runs due to insertion of PRIM VBs.
Example: genozip mybam.bam --biopsy 5-7,11 will emit the txt header and VBs 5,6,7,11.
​
Note: The biopsy is taken after reading (and possibly modifying) the VBlocks without segging. Modification options such
--optimize, --add-line-numbers, --add-seq, --head apply.
​​
Note: --no-gencomp is implicit unless --force-gencomp is specified.
​​
Note: The txt header is always included, unless --no-header is specified.
​​
--biopsy-line=vb/line
Z. Dump a single line. vb is 1-based VBlock number and line is 0-based line within the VBlock.
​
Note: Modification options such as --optimize, --add-line-numbers, -add-seq, --head apply.
​
--skip-segconf
Z. Intended to be used in combination with --biopsy to skip segconf (useful for taking a biopsy of defective files).
​
-B, --vblock
C. Use with a 'B' suffix to specify a low number of bytes eg -B100000B. Useful for then subsetting with --biopsy.
​
--head N
C. Compress only the first N lines. When using this option Genozip compresses only VB=1 so vblock needs to
be large enough to contain the specified number of lines. Also, since it is only VB=1, no gencomp is possible.
​
Tracking execution
​
--show-reading-list
UC. Show list of TXT_HEADER and VB_HEADER sections that will be read by piz main loop.
​
--show-stack
UC. Show pushing and popping of containers on the container stack.
​
--show-containers[=field] or [=vblock_i]
ZUC. Show flow of containers. Possibly with the values of a specific vblock_i or specific field (use
--STATS to see the field names in the file).
​
--show-snips
UC. Show snips as they are being reconstructed.
​
--show-plan
ZUC. Shows reconstruction plan.
​
--show-threads
ZUC. Show thread dispatcher activity.
​
--debug-threads
ZUCL. Alternative to --show-threads - store thread log in a buffer and display it in case of an error.
​
--debug-lines
ZUC. ZIP: adds an Adler32 signature to each line which will be verified in PIZ.
​
--add-line-numbers
C. SAM only: adds a field VB:Z describing the comp_i vblock_i and line_i of the line
​
--add-seq
Z. SAM only: adds back a SEQ column that was previously removed. The bases will be all 'A', and the length will be identical to the length of the QUAL field.
​
--debug-seg[=field]
Z. Shows snips being segmented into contexts - possibly limiting to a specific field (use
--STATS to see the field names in the file).
​
--debug-tar
Z. shows the details of creating or decompessing from a tar file with --tar (TIP) or --t_offset --t_size (PIZ)
​
--count=VB
Show number of lines written for each VBlock (note: --count without an argument shows lines written in the entire file).
​​
--debug-progress
ZUC. See raw numbers that feed into the progress indicator.
​
--debug-stats
Z. See details in the creation process of the --stats report.
​
--debug-generate
Z. See contexts that are marked as "all the same" and are removed or shrunk.
​
--debug-recon-size
Z. See vb->context[]->txt_len and vb->recon_size.
​
--debug-gencomp
Z. SAM/BAM.CRAM: View the queues of generated component buffers.
​​
--debug-sag
Z. SAM/BAM/CRAM: For each failing candidate line for SA Groups - show the reason for its failure to get included.
​​
--show-scan
Z. SAM/BAM/CRAM: Show statistics from the pre-processing scan in case of sag BY_FLAG.
​
--show-time[=res] or [=comp_i]
ZUCL. Show what functions are consuming the most time. Optional res is one of the members of ProfilerRec defined
in profiler.h such 'compressor_lzma' or a substring such as 'compressor_'. Alternatively, optional comp_i (0-based) to show time of just one component.
​
--show-aligner
ZUC. SAM/BAM/CRAM/FASTQ: Show alignments of reads as generated by the Genozip aligner.
​
--show-digest
ZUC. Show digest (MD5 or Adler32) updates.
​
--log-digest
ZUC. Output the data hashed for digest_ctx_bound to digest.zip.log and digest.piz.log
​
--show-mutex[=mutex-name].
ZUC. Shows locks and unlocks of all mutexes or a particular mutex.
​
--debug-read-ctxs
ZUC. Show all B250, LOCAL and DICT sections as they are read/skipped and decompressed during genocat/genounzip.
Note: For genozip this is only relevant for reading sections of the first FASTQ file when compressing the second FASTQ
file with --pair.
​​
--show-gz-uncomp
Z. Shows decompressions and truncations of GZ/BGZF/GZIL data.
​
--show-compress
Z. Shows compressing of B250, LOCAL and DICT section data.
​
--show-uncompress
ZUC. Shows uncompressing of section data.
​
--debug-peek
UC. Shows reconstructor peek stack.
​
--show-plan
UC. Shows the reconstruction plan.
​
--show-wrong-MD
C. SAM/BAM/CRAM: with MD:Z field - shows cases where the special MD algorithm is not applied to the MD:Z in the data.
​​
--show-wrong-XG
C. SAM/BAM/CRAM: with BS-Seeker2 XG:Z field - shows cases where the special XG algorithm is not applied to the XG:Z in the data.
​​
--show-wrong-XM
C. SAM/BAM/CRAM: with Bismark or BSSeeker2 XM:Z field - shows cases where the special XM algorithm is not applied to the XM:Z in the data.
​
--show-wrong-XB
C. SAM/BAM//CRAM with BSBolt XB:Z field: shows cases in which the predicted methylation string differs than the actual.
​​
--debug-LONG
Z. SAM/BAM/CRAM/FASTQ: treat data as long reads regardless of the actual read length.
​​
--show-qual
ZUC. SAM/BAM/CRAM/FASTQ: see internal data of the QUAL compression codecs.
​​
--debug-qname
C. SAM/BAM/CRAM/FASTQ: show QNAME flavor unit test.
​
--show-buddy
ZUC. SAM/BAM/CRAM:: show buddy (which can be a mate or saggy or both) for each line that has one.
​
--show-huffman
ZUC. show bit sequences encoding each character for Huffman in-memory compression used in gencomp / deep.
​
--debug-huffman
Z. show parameters of the Huffman in-memory compression used in gencomp / deep.
​
--debug-split container
ZUCL. show why str_split_by_container() fails. Useful for debugging new qname flavors.
​
--show-segconf-has
Z. show fields encountered during segconf.
​
--show-deep[=qname_hash,seq_hash,qual_hash | =all]
ZUC. Deep: show deep parameters. Usually used in combination with --deep, but can also be used without --deep. Optionally providing a specific hash outputs (in hex) for more information regarding lines with that hash, and all provides very detailed information.
​​
--debug-valgrind
ZUCL. Normally Genozip refrains from releasing resources if the process is about to terminate - as process termination would release the resources faster. However, if valgrind is running, or if --debug-valgrind is specified (even without valgrind running), Genozip does release all resources, to allow detection of true resource leaks.
​
--debug-upgrade
Z. Show progress of thread checking for a new version. See also --debug-latest.
​
--show-cache
ZUC. Shows the execution steps in the complex process of loading a cached reference file.
​
Tracking compression performance
-w, --stats
Show the internal structure of a genozip file and the associated compression statistics.
​
-W, --STATS
Show more detailed statistics.
Note: specifying -W or -w twice, results in the header line of the statistics printed to stderr, thereby surviving piping stdout to grep
​
--show-filename
Show the file name for each file.
--stats=comp_i, --STATS=comp_i
Z. similar to --stats or --STATS but shows stats for a single component (comp_i is 0-based).
​
--show-codec
Z. Genozip tests for the best codec when it first encounters a new type of data. See the results.
​
--verify-codec
ZUC. Verifies each section's decompression correctness against an Adler32 that is stored in SectionHeader.magic.
Note: the Genozip file generated when using this option is not a valid Genozip file as it has the wrong magic -
this option is designed for detecting issues while developing new codecs.
Example: genozip -t --verify-codec myfile.sam
--submit-stats
Z. Submit aggregate stats of compression performance and metadata to the server
​
--debug-submit
Z. Submits stats for debugging
​
--debug-debug
ZUCL. Ad hoc option to assist debugging
​
Controlling execution
--one-vb vb
UC. Reconstruct data from a single VB. Can be used with (1) genocat or (2) genounzip --test.
​
--seg-only
Z. Run the segmenter but don't compress and don't write the output.
​
--xthreads
ZUC. Use only one thread for the main PIZ/ZIP dispatcher. This doesn't affect thread use of other dispatchers.
​
--no-zriter
Z. Don't use background threads for writing the .genozip output file.
​
--no-eval
ZUCL. Don't allow features on evaluation basis (used for testing permissions).
​
--no-faf
Z. FASTA: Don't used the "Fasta-As-Fastq" method.
​​
--no-interleaved
Z. FASTQ/FASTA: Don't used the "Interleaved" method for improving the compression of interleaved files.
​​
--no-domq, --no-pacb, --no-longr, --no-smux, --no-homp
Z. SAM/BAM/CRAM/FASTQ: disallow use of specific codecs when compressing QUAL.
--force-domq, --force-pacb, --force-longr, --force-smux, --force-normq, --force-homp
Z. SAM/BAM/CRAM/FASTQ: force a specific codec for compressing QUAL.
​​
--force-reread
Z. SAM/BAM/CRAM: When genozip uses the gencomp method, all DEPN lines are re-read from disk and none are cached in memory. Possible to combine with --force-gencomp.
​
--force-PLy
Z. VCF: force the PLy mehod for compressing FORMAT/PL.
​​
--debug-latest
ZU. Force genozip version upgrade notice, See also --debug-upgrade.
​​
--debug
ZUCL. Execute various debugging logic
​​
Miscellaneous
​​
--generate-il1m
Z. Compress to IL1M format: stdin to stdout. Set libdeflate compression level and XFL with --best and --fast that must be specified before --generate-il1m.