top of page

Diagnostics (for technical support)



As command line options for genozip (Z), genounzip (U), genocat (C), genols (L)


Note: When used with genocat most options show only the requested metadata and not the file data itself.


Memory consumption



ZUCL. Show which Buffers are consuming the most memory. Normally, memory is sampled at the end of comprssion or decompression. With =PEAK, each Buffer retains its maximum allocation throughout execution.

kill -USR1 pid 

ZUCL. Executes --show-memory on a running process. Not available on Windows.


ZUCL. Show Buffer allocations and destructions. If <bytes> is specified then show only allocations of at least <bytes>.


Z. See raw numbers that feed into determining the size of the global hash tables.

genozip file contents



ZUC.  Show more detailed statistics.

Note: specifying -W or -w twice, results in the header line of the statistics printed to stderr, thereby surviving piping stdout to grep.


C. Show the data type of a genozip file.


ZUC. (VCF only) Output allele values to stdout. Each row corresponds to a row in the VCF file. Mixed-ploidy regions are padded and 2-digit allele values are replaced by an ascii character.


ZUC. Show dictionaries read/written for each vblock. With optional field (use --STATS to see the field names in the file) shows only that one field.


ZUC. Show singletons in local. 


ZUC. Show (per snip in dictionary) the number of words in the file using this snip. genozip - works for any context (use --STATS to see context names). genounzip/genocat - works only for contexts that have a SEC_COUNTS section (which include 

any contexts in a file generated with genozip --show-counts of that context).


ZUC. Show b250 sections content - each value shows the line (counting from 1) and the index into its dictionary (note: REF 

and ALT are compressed together as they are correlated). With optional field (eg CHROM ; RNAME ; POS ; AN etc) shows only

that one field. This also works with genounzip and genocat but without the line numbers.


ZUC.  Dump the binary content of the b250 data of this field exactly as they appear in the genozip format to a file named 

"field.b250" - specify the field name as it appears in the Name column in --STATS for fields that have "comp b250" data.


ZUC.  Same as --dump-b250 just for the local buffer.


ZUC.  List the names of the chromosomes (or contigs) included in the file. Alternative names: --chroms --list-chroms

--dump-section section-type

ZUC. Dump the uncompressed unencrypted contents of all sections of this type (as it appears in 

--show-gheaders eg SEC_REFERENCE) to a files named "section-type.vb.dict_id.[header|body]".



ZUC. Show all the sections headers or those of a specific section type or field name if the optional argument is provided. Argument is a case-insesitive substring of a section name or a case-sensitive field name. genozip and genounzip show the headers encountered in their normal operation, while genocat shows all the headers in the file, in the order they are in the file. 


In combination with --force, magic-based scanning for headers is conducted in the file without relying on the section list - useful for truncated or corrupted genozip files, and also for private files (compressed with --sendto). 


UC. In case of GENOZIP_HEADER section does not appear in the offset specified in the footer due to corruption - scan for the header. Used in combination with --show-headers --force or --show-gheaders --force.



ZUC. Show the content of the random access index (SEC_RANDOM_ACCESS section).


Z. Show the byte offset of each line


ZUC. Show the ranges included the SEC_REFERENCE sections


UC. Show the ranges as in RefStruct.ranges


ZUC. Show the reference sequences as stored internally in a SAM BAM or FASTQ file (also works for a reference file but 

--reference --regions is faster). Combine with --regions to see specific regions (genocat only). Combine with 

--sequential to omit newlines. '-' appears in unset loci.


ZUC. Show the content of the random access index of the reference data (SEC_REF_RAND_ACC section).


ZUC. Show the details of the reference hash table (SEC_REF_HASH) sections.


ZUC. Show the details of the file contigs that are mapped to a different contig name in the reference (eg '22' ➔ 'chr22').


ZUC. Show the details of the reference contigs.


ZC. Show the the IUPACs in the reference. In combination with 

genozip --chain - also shows the VCF variants that have a IUPAC in the Luft reference and how they are handled.


ZUC. (SAM and BAM) Show the details of the contigs appearing the file header (SQ lines).


ZUC.  Show the content of the genozip header (which also includes the list of all sections in the file). 

--show-gheader=2 shows the section list after modification (if any) by writer_create_plan.


ZUC.  Show vblock information as they are read / written. Optional task limits output to a specific dispatcher task, e.g. piz.


ZUC. See contents of SEC_DICT_ID_ALIASES section.


ZUC. Show the ranges included the SEC_REFERENCE sections.

--show-is-set contig

UC. Shows the contents of SEC_REF_IS_SET section for contig.


Z. Show details about the GZ / BGZF compression of a file.


ZUC. Show BGZF blocks as they are being compressed or decompressed.


ZUC. SAM/BAM: Show SA groups (supplementary / secondary alignments + their primary alignment).


Z. SAM/BAM: Show supplementary / secondary alignments that are successfully mapped against a primary alignment.

Text file contents



C. Show alignments of a BAM file.



C. SAM/BAM: show statistics regarding inserted bases.

Subsetting a file for debugging


--biopsy=vb[,vb...] or [MAIN]|[PRIM]|[DEPN}

Z. Dump a subset VBs of the source file being compressed and including the txt header. The argument is a comma separated list of VB numbers or VB ranges. An argument of 0 means txt header only.

For SAM/BAM only: a comma-seperated combination of MAIN, PRIM and/or DEPN may be specified. This is useful as in gencomp VB numbers might change between runs due to insertion of PRIM VBs. 


Examplegenozip mybam.bam --biopsy 5-7,11 will emit the txt header and VBs 5,6,7,11.

Note: The biopsy is taken after reading (and possibly modifying) the VBlocks without segging. Modification options such
--optimize, --add-line-numbers, --head apply.

Note: --no-gencomp is implicit unless --force-gencomp is specified.

Note: The txt header is always included, unless --no-header is specified.


Z. Dump a single line. vb is 1-based VBlock number and line is 0-based line within the VBlock.


Note: Modification options such as --optimize, --add-line-numbers, --head apply.


Z. Intended to be used in combination with --biopsy to skip segconf (useful for taking a biopsy of defective files).


C. Use with a 'B' suffix to specify a low number of bytes eg -B100000B. Useful for then subsetting with --biopsy.


C. Compress only the first N lines (default: 10). When using this option Genozip compresses only VB=1 so vblock needs to 

be large enough to contain the specified number of lines. Also, since it is only VB=1, no gencomp is possible.


Z. Allow compression of truncated files. Genozip will compress only full lines (e.g. reads, alignments, variants...) and will stop at the first partial line or partial BGZF block. The digest is computed only on the data actually compressed.

Tracking execution



UC. Show pushing and popping of containers on the container stack. 

--show-containers[=field] or [=vblock_i]  

ZUC. Show flow of containers. Possibly with the values of a specific vblock_i or specific field (use 

--STATS to see the field names in the file).


UC. Show snips as they are being reconstructed.


ZUC. Shows reconstruction plan. Combine with --luft to see Luft reconstruction plan.


ZUC.  Show thread dispatcher activity.


ZUCL.  Alternative to --show-threads - store thread log in a buffer and display it in case of an error.


ZUC.  ZIP: adds an Adler32 signature to each line which will be verified in PIZ.


C.  SAM only: adds a field VB:Z describing the comp_i vblock_i and line_i of the line


Z.  Shows snips being segmented into contexts - possibly limiting to a specific field (use 

--STATS to see the field names in the file).


          Z. shows the details of creating or decompessing from a tar file with --tar (TIP) or --t_offset --t_size (PIZ)

--count=VB  CU. 

Show number of lines written for each VBlock (note: --count without an argument shows lines written in the entire file).


ZUC. See raw numbers that feed into the progress indicator.


Z. See details in the creation process of the --stats report.


Z. See contexts that are marked as "all the same" and are removed or shrunk.


Z. See vb->context[]->txt_len and vb->recon_size.


Z. SAM/BAM/VCF: View the queues of generated component buffers.


Z. SAM/BAM: For each failing candidate line for SA Groups - show the reason for its failure to get included.

--show-time[=res] or [=comp_i]

ZUCL. Show what functions are consuming the most time. Optional res is one of the members of ProfilerRec defined 

in profiler.h such 'compressor_lzma' or a substring such as 'compressor_'. Alternatively, optional comp_i (0-based) to show time of just one component.


ZUC. SAM/BAM/FASTQ: Show alignments of reads as generated by the Genozip aligner.


ZUC. Show digest (MD5 or Adler32) updates.


ZUC. Output the data hashed for digest_ctx_bound to and digest.piz.log


ZUC. Shows locks and unlocks of all mutexes or a particular mutex.


ZUC. Show all B250, LOCAL and DICT sections as they are read/skipped and decompressed during genocat/genounzip. 

Note: For genozip this is only relevant for reading sections of the first FASTQ file when compressing the second FASTQ 

file with --pair.


Z. Shows compressing of B250, LOCAL and DICT section data.


ZUC. Shows uncompressing of section data.


UC. Shows reconstructor peek stack.


UC. Shows the reconstruction plan.


C. SAM/BAM with MD:Z field - shows cases where the special MD algorithm is not applied to the MD:Z in the data.


C. SAM/BAM with BS-Seeker2 XG:Z field - shows cases where the special XG algorithm is not applied to the XG:Z in the data.


C. SAM/BAM with Bismark or BSSeeker2 XM:Z field - shows cases where the special XM algorithm is not applied to the XM:Z in the data.


C. SAM with BSBolt XB:Z field - shows cases in which the predicted methylation string differs than the actual.


Z. SAM/BAM and FASTQ: treat data as long reads regardless of the actual read length.


ZUC. SAM/BAM and FASTQ: see internal data of the QUAL compression codecs.


C. SAM/BAM and FASTQ: show QNAME flavor unit test.


ZUC. SAM/BAM: show buddy (which can be a mate or saggy or both) for each line that has one.


UC. Deep: show parameters of the Huffman compression of QNAMEs, used during decompression of Deep files.

--debug-split container

ZUCL. show why str_split_by_container() fails. Useful for debugging new qname flavors.


Z. show fields encountered during segconf.


ZUC. Deep: show deep parameters (optionally: of a single alignment). Usually used in combination with --deep, but can also be used without --deep.


ZUCL. Normally Genozip refrains from releasing resources if the process is about to terminate - as process termination would release the resources faster. However, if valgrind is running, or if --debug-valgrind is specified (even without valgrind running), Genozip does release all resources, to allow detection of true resource leaks.


Z. Show progress of thread checking for a new version. See also --debug-latest.


ZUC. Shows the execution steps in the complex process of loading a cached reference file.

Tracking compression performance



Show the internal structure of a genozip file and the associated compression statistics.


Show more detailed statistics.


Note: specifying -W or -w twice, results in the header line of the statistics printed to stderr, thereby surviving piping stdout to grep


Show the file name for each file.


--stats=comp_i, --STATS=comp_i 

Z. similar to --stats or --STATS but shows stats for a single component (comp_i is 0-based).


Z. Genozip tests for the best codec when it first encounters a new type of data. See the results.


ZUC. Verifies each section's decompression correctness against an Adler32 that is stored in SectionHeader.magic. 

Note: the Genozip file generated when using this option is not a valid Genozip file as it has the wrong magic - 

this option is designed for detecting issues while developing new codecs.


Examplegenozip -t --verify-codec myfile.sam



Z. Submit aggregate stats of compression performance and metadata to the server


          Z. Submits stats for debugging


          ZUCL. Ad hoc option to assist debugging

Controlling execution


--one-vb vb  

UC. Reconstruct data from a single VB. Can be used with (1) genocat or (2) genounzip --test.


Z. Run the segmenter but don't compress and don't write the output.


ZUC. Use only one thread for the main PIZ/ZIP dispatcher. This doesn't affect thread use of other dispatchers.


Z. Don't use background threads for writing the .genozip output file.


ZUCL. Don't allow features on evaluation basis (used for testing permissions).


Z. Don't used the "Fasta-As-Fastq" method for compressing FASTA.


Z. Don't used the "Interleaved" method for improving the compression of interleaved FASTQ and FASTA files.

--no-domq, --no-pacb, --no-longr, --no-smux, --no-homp

Z. SAM/BAM/FASTQ: disallow use of specific codecs when compressing QUAL.

--force-domq, --force-pacb, --force-longr, --force-smux, --force-normq, --force-homp

Z. SAM/BAM/FASTQ: force a specific codec for compressing QUAL.


Z. VCF: force the PLy mehod for compressing FORMAT/PL.


ZU. Force genozip version upgrade notice, See also --debug-upgrade.


ZUCL. Execute various debugging logic

bottom of page