Diagnostics (for technical support)
As command line options for genozip (Z), genounzip (U), genocat (C), genols (L)
Note: When used with genocat most options show only the requested metadata and not the file data itself.
ZUCL. Show which Buffers are consuming the most memory. Normally, memory is sampled at the end of comprssion or decompression. With =PEAK, each Buffer retains its maximum allocation throughout execution.
kill -USR1 pid
ZUCL. Executes --show-memory on a running process. Not available on Windows.
ZUCL. Show Buffer allocations and destructions. If <bytes> is specified then show only allocations of at least <bytes>.
Z. See raw numbers that feed into determining the size of the global hash tables.
genozip file contents
ZUC. (VCF only) Output allele values to stdout. Each row corresponds to a row in the VCF file. Mixed-ploidy regions are padded and 2-digit allele values are replaced by an ascii character.
ZUC. Show dictionaries read/written for each vblock. With optional field (use --STATS to see the field names in the file) shows only that one field.
ZUC. Show (per snip in dictionary) the number of words in the file using this snip. genozip - works for any context (use --STATS to see context names). genounzip/genocat - works only for contexts that have a SEC_COUNTS section (which include
any contexts in a file generated with genozip --show-counts of that context).
ZUC. Show b250 sections content - each value shows the line (counting from 1) and the index into its dictionary (note: REF
and ALT are compressed together as they are correlated). With optional field (eg CHROM ; RNAME ; POS ; AN etc) shows only
that one field. This also works with genounzip and genocat but without the line numbers.
.ZUC. Dump the binary content of the b250 data of this field exactly as they appear in the genozip format to a file named
"field.b250" - specify the field name as it appears in the Name column in --STATS for fields that have "comp b250" data.
ZUC. Same as --dump-b250 just for the local buffer.
ZUC. List the names of the chromosomes (or contigs) included in the file. Alternative names: --chroms --list-chroms
ZUC. Dump the uncompressed unencrypted contents of all sections of this type (as it appears in
--show-gheaders eg SEC_REFERENCE) to a files named "section-type.vb.dict_id.[header|body]".
Show all the sections headers or those of a specific section type if the optional argument is provided. Argument is a case-insesitive substring of a section name. genozip and genounzip show the headers encountered in their normal operation,
while genocat shows all the headers in the file, in the order they are in the file.
Z. SAM/BAM: produce PRIM/DEPN components even for non-sorted files.
Z. SAM/BAM: don't produce PRIM/DEPN components.
Z. SAM/BAM/FASTQ: don't use the DOMQUAL codec when compressing QUAL.
ZUC. Show the content of the random access index (SEC_RANDOM_ACCESS section).
Z. Show the byte offset of each line
ZUC. Show the ranges included the SEC_REFERENCE sections
UC. Show the ranges as in RefStruct.ranges
ZUC. Show the reference sequences as stored internally in a SAM BAM or FASTQ file (also works for a reference file but
--reference --regions is faster). Combine with --regions to see specific regions (genocat only). Combine with
--sequential to omit newlines. '-' appears in unset loci.
C. Show the difference between two reference files. Use in combination with two --reference arguments.
ZUC. Show the content of the random access index of the reference data (SEC_REF_RAND_ACC section).
ZUC. Show the details of the reference hash table (SEC_REF_HASH) sections.
ZUC. Show the details of the file contigs that are mapped to a different contig name in the reference (eg '22' ➔ 'chr22').
ZUC. Show the details of the reference contigs.
ZC. Show the the IUPACs in the reference. In combination with
genozip --chain - also shows the VCF variants that have a IUPAC in the Luft reference and how they are handled.
C. Show inclusion or exclusion of lines. Used in combination with --taxid.
ZUC. (SAM and BAM) Show the details of the contigs appearing the file header (SQ lines).
ZUC. Show the content of the genozip header (which also includes the list of all sections in the file).
--show-gheader=2 shows the section list after modification (if any) by writer_create_plan.
ZUC. Show vblock headers as they are read / written.
ZUC. See contents of SEC_DICT_ID_ALIASES section.
ZUC. Show the ranges included the SEC_REFERENCE sections.
UC. Shows the contents of SEC_REF_IS_SET section for contig.
ZUC. Show BGZF blocks as they are being compressed or decompressed.
ZUC. SAM/BAM: Show SA groups (supplementary / secondary alignments + their primary alignment).
Z. SAM/BAM: Show supplementary / secondary alignments that are successfully mapped against a primary alignment.
C. Show line-by-line outcome of the liftover of that line. Used with dual-coordinate files and may be combined --luft.
See: Dual-coordinate VCF files
Text file contents
C. Show alignments of a BAM file.
Subsetting a file for debugging
Z. Dump a subset VBs of the source file being compressed and including the txt header. The argument is a comma separated list of VB numbers or VB ranges. An argument of 0 means txt header only.
Example: genozip mybam.bam --biopsy 5-7,11 will emit the txt header and VBs 5,6,7,11.
Notes: The biopsy is taken after reading the VBlocks without segging.
Z. Dump a single line. vb is 1-based VBlock number and line is 0-based line within the VBlock.
C. Use with a 'B' suffix to specify a low number of bytes eg -B100000B. Useful for then subsetting with --biopsy.
C. Compress only the first N lines (default: 10). When using this option Genozip compresses only VB=1 so vblock needs to
be large enough to contain the specified number of lines.
--show-containers[=field] or [=vblock_i]
ZUC. Show flow of containers. Possibly with the values of a specific vblock_i or specific field (use
--STATS to see the field names in the file).
ZUC. Shows reconstruction plan. Combine with --luft to see Luft reconstruction plan.
ZUC. Show thread dispatcher activity.
ZUCL. Alternative to --show-threads - store thread log in a buffer and display it in case of an error.
Z. ZIP: adds an Adler32 signature to each line which will be verified in PIZ.
C. SAM only: adds a field VB:Z describing the comp_i vblock_i and line_i of the line
Z. Shows snips being segmented into contexts - possibly limiting to a specific field (use
--STATS to see the field names in the file).
Show number of lines written for each VBlock (note: --count without an argument shows lines written in the entire file).
ZUC. See raw numbers that feed into the progress indicator.
Z. See details in the creation process of the --stats report.
Z. See contexts that are marked as "all the same" and are removed or shrunk.
Z. See vb->context->txt_len and vb->recon_size.
Z. SAM/BAM/VCF: View the queues of generated component buffers.
Z. SAM/BAM: For each failing candidate line for SA Groups - show the reason for its failure to get included.
ZUCL. Show what functions are consuming the most time. Optional res is one of the members of ProfilerRec defined
in profiler.h such 'compressor_lzma' or a substring such as 'compressor_'.
ZUC. SAM/BAM/FASTQ: Show alignments of reads as generated by the Genozip aligner.
ZUC. Show digest (MD5 or Adler32) updates.
ZUC. Output the data hashed for digest_ctx_bound to digest.zip.log and digest.piz.log
ZUC. Shows locks and unlocks of all mutexes or a particular mutex.
ZUC. Show all B250 LOCAL and DICT sections as they are read/skipped and decompressed during genocat/genounzip.
Note: For genozip this is only relevant for reading sections of the first FASTQ file when compressing the second FASTQ
file with --pair.
ZUC. Shows uncompressing of section data.
UC. Shows reconstructor peek stack.
ZUCL. Shows internal flags after initialization.
UC. Shows the reconstruction plan.
C. When used with dual-coordinate VCF file shows for each variant its Coordinates (Primary Luft or Both) and its oStatus.
C. SAM/BAM with MD:Z field - shows cases where the special MD algorithm is not applied to the MD:Z in the data.
C. SAM/BAM with BS-Seeker2 XG:Z field - shows cases where the special XG algorithm is not applied to the XG:Z in the data.
C. SAM/BAM with Bismark or BSSeeker2 XM:Z field - shows cases where the special XM algorithm is not applied to the XM:Z in the data.
C. SAM with BSBolt XB:Z field - shows cases in which the predicted methylation string differs than the actual.
Z. SAM/BAM and FASTQ: treat data as long reads regardless of the actual read length.
ZUC. SAM/BAM and FASTQ: see internal data of the QUAL compression codecs.
C. SAM/BAM and FASTQ: show QNAME flavor unit test.
ZUC. SAM/BAM: show buddy (which can be a mate or saggy or both) for each line that has one.
Tracking compression performance
Show the internal structure of a genozip file and the associated compression statistics.
Show more detailed statistics.
Note: specifying -W or -w twice, results in the header line of the statistics printed to stderr, thereby surviving piping stdout to grep
Show the file name for each file.
Z. similar to --stats or --STATS but shows stats for a single component
Z. Genozip tests for the best codec when it first encounters a new type of data. See the results.
ZUC. Verifies each section's decompression correctness against an Adler32 that is stored in SectionHeader.magic.
Note: the Genozip file generated when using this option is not a valid Genozip file as it has the wrong magic -
this option is designed for detecting issues while developing new codecs.
Example: genozip -t --verify-codec myfile.sam
Z. Submit aggregate stats of compression performance and metadata to the server
Z. Submits stats for debugging
ZUCL. Ad hoc option to assist debugging
UC. Reconstruct data from a single VB. Can be used with (1) genocat or (2) genounzip --test.
Z. Run the segmenter but don't compress and don't write the output.
ZUC. Use only one thread for the main PIZ/ZIP dispatcher. This doesn't affect thread use of other dispatchers.
ZU. Force genozip version upgrade notice
ZUCL. Execute various debugging logic