
Diagnostics (for technical support)
Usage
As command line options for genozip (Z), genounzip (U), genocat (C), genols (L)
Note: When used with genocat most options show only the requested metadata and not the file data itself.
Memory consumption
--show-memory[=PEAK]
ZUCL. Show which Buffers are consuming the most memory. Normally, memory is sampled at the end of comprssion or decompression. With =PEAK, each Buffer retains its maximum allocation throughout execution.
kill -USR1 pid
ZUCL. Executes --show-memory on a running process. Not available on Windows.
--debug-memory[=bytes]
ZUCL. Show Buffer allocations and destructions. If <bytes> is specified then show only allocations of at least <bytes>.
--show-hash
Z. See raw numbers that feed into determining the size of the global hash tables.
genozip file contents
--show-alleles
ZUC. (VCF only) Output allele values to stdout. Each row corresponds to a row in the VCF file. Mixed-ploidy regions are padded and 2-digit allele values are replaced by an ascii character.
--show-dict[=field]
ZUC. Show dictionaries read/written for each vblock. With optional field (use --STATS to see the field names in the file) shows only that one field.
--show-counts=field
ZUC. Show (per snip in dictionary) the number of words in the file using this snip. genozip - works for any context (use --STATS to see context names). genounzip/genocat - works only for contexts that have a SEC_COUNTS section (which include
any contexts in a file generated with genozip --show-counts of that context).
--show-b250[=field]
ZUC. Show b250 sections content - each value shows the line (counting from 1) and the index into its dictionary (note: REF
and ALT are compressed together as they are correlated). With optional field (eg CHROM ; RNAME ; POS ; AN etc) shows only
that one field. This also works with genounzip and genocat but without the line numbers.
--dump-b250=field
.ZUC. Dump the binary content of the b250 data of this field exactly as they appear in the genozip format to a file named
"field.b250" - specify the field name as it appears in the Name column in --STATS for fields that have "comp b250" data.
--dump-local=field
ZUC. Same as --dump-b250 just for the local buffer.
--contigs
ZUC. List the names of the chromosomes (or contigs) included in the file. Alternative names: --chroms --list-chroms
--dump-section section-type
ZUC. Dump the uncompressed unencrypted contents of all sections of this type (as it appears in
--show-gheaders eg SEC_REFERENCE) to a files named "section-type.vb.dict_id.[header|body]".
--show-headers section-type
Show all the sections headers or those of a specific section type if the optional argument is provided. Argument is a case-insesitive substring of a section name. genozip and genounzip show the headers encountered in their normal operation,
while genocat shows all the headers in the file, in the order they are in the file.
--force-gencomp
Z. SAM/BAM: produce PRIM/DEPN components even for non-sorted files.
--no-gencomp
Z. SAM/BAM: don't produce PRIM/DEPN components.
--no-domqual
Z. SAM/BAM/FASTQ: don't use the DOMQUAL codec when compressing QUAL.
--show-index
ZUC. Show the content of the random access index (SEC_RANDOM_ACCESS section).
--show-lines
Z. Show the byte offset of each line
--show-reference
ZUC. Show the ranges included the SEC_REFERENCE sections
--show-ranges
UC. Show the ranges as in RefStruct.ranges
--show-ref-seq
ZUC. Show the reference sequences as stored internally in a SAM BAM or FASTQ file (also works for a reference file but
--reference --regions is faster). Combine with --regions to see specific regions (genocat only). Combine with
--sequential to omit newlines. '-' appears in unset loci.
--show-ref-diff
C. Show the difference between two reference files. Use in combination with two --reference arguments.
--show-ref-index
ZUC. Show the content of the random access index of the reference data (SEC_REF_RAND_ACC section).
--show-ref-hash
ZUC. Show the details of the reference hash table (SEC_REF_HASH) sections.
--show-chrom2ref
ZUC. Show the details of the file contigs that are mapped to a different contig name in the reference (eg '22' ➔ 'chr22').
--show-ref-contigs
ZUC. Show the details of the reference contigs.
--show-ref-iupacs
ZC. Show the the IUPACs in the reference. In combination with
genozip --chain - also shows the VCF variants that have a IUPAC in the Luft reference and how they are handled.
--show-kraken
C. Show inclusion or exclusion of lines. Used in combination with --taxid.
--show-txt-contigs
ZUC. (SAM and BAM) Show the details of the contigs appearing the file header (SQ lines).
--show-gheader
ZUC. Show the content of the genozip header (which also includes the list of all sections in the file).
--show-gheader=2 shows the section list after modification (if any) by writer_create_plan.
--show-vblocks
ZUC. Show vblock headers as they are read / written.
--show-aliases
ZUC. See contents of SEC_DICT_ID_ALIASES section.
--show-reference
ZUC. Show the ranges included the SEC_REFERENCE sections.
--show-is-set contig
UC. Shows the contents of SEC_REF_IS_SET section for contig.
--show-bgzf
ZUC. Show BGZF blocks as they are being compressed or decompressed.
--show-sag[=grp_i]
ZUC. SAM/BAM: Show SA groups (supplementary / secondary alignments + their primary alignment).
--show-depn
Z. SAM/BAM: Show supplementary / secondary alignments that are successfully mapped against a primary alignment.
--show-dvcf
C. Show line-by-line outcome of the liftover of that line. Used with dual-coordinate files and may be combined --luft.
See: Dual-coordinate VCF files
Text file contents
--show-bam
C. Show alignments of a BAM file.
Subsetting a file for debugging
--biopsy=vb[,vb...]
Z. Dump a subset VBs of the source file being compressed and including the txt header. The argument is a comma separated list of VB numbers or VB ranges. An argument of 0 means txt header only.
Example: genozip mybam.bam --biopsy 5-7,11 will emit the txt header and VBs 5,6,7,11.
Notes: The biopsy is taken after reading the VBlocks without segging.
--biopsy-line=vb/line
Z. Dump a single line. vb is 1-based VBlock number and line is 0-based line within the VBlock.
-B, --vblock
C. Use with a 'B' suffix to specify a low number of bytes eg -B100000B. Useful for then subsetting with --biopsy.
--head[=N]
C. Compress only the first N lines (default: 10). When using this option Genozip compresses only VB=1 so vblock needs to
be large enough to contain the specified number of lines.
Tracking execution
--show-containers[=field] or [=vblock_i]
ZUC. Show flow of containers. Possibly with the values of a specific vblock_i or specific field (use
--STATS to see the field names in the file).
--show-plan
ZUC. Shows reconstruction plan. Combine with --luft to see Luft reconstruction plan.
--show-threads
ZUC. Show thread dispatcher activity.
--debug-threads
ZUCL. Alternative to --show-threads - store thread log in a buffer and display it in case of an error.
--debug-lines
Z. ZIP: adds an Adler32 signature to each line which will be verified in PIZ.
--add-line-numbers
C. SAM only: adds a field VB:Z describing the comp_i vblock_i and line_i of the line
--debug-seg[=field]
Z. Shows snips being segmented into contexts - possibly limiting to a specific field (use
--STATS to see the field names in the file).
--count=VB CU.
Show number of lines written for each VBlock (note: --count without an argument shows lines written in the entire file).
--debug-progress
ZUC. See raw numbers that feed into the progress indicator.
--debug-stats
Z. See details in the creation process of the --stats report.
--debug-generate
Z. See contexts that are marked as "all the same" and are removed or shrunk.
--debug-recon-size
Z. See vb->context[]->txt_len and vb->recon_size.
--debug-gencomp
Z. SAM/BAM/VCF: View the queues of generated component buffers.
--debug-sag
Z. SAM/BAM: For each failing candidate line for SA Groups - show the reason for its failure to get included.
--show-time[=res]
ZUCL. Show what functions are consuming the most time. Optional res is one of the members of ProfilerRec defined
in profiler.h such 'compressor_lzma' or a substring such as 'compressor_'.
--show-aligner
ZUC. SAM/BAM/FASTQ: Show alignments of reads as generated by the Genozip aligner.
--show-digest
ZUC. Show digest (MD5 or Adler32) updates.
--log-digest
ZUC. Output the data hashed for digest_ctx_bound to digest.zip.log and digest.piz.log
--show-mutex[=mutex-name].
ZUC. Shows locks and unlocks of all mutexes or a particular mutex.
--debug-read-ctxs
ZUC. Show all B250 LOCAL and DICT sections as they are read/skipped and decompressed during genocat/genounzip.
Note: For genozip this is only relevant for reading sections of the first FASTQ file when compressing the second FASTQ
file with --pair.
--show-uncompress
ZUC. Shows uncompressing of section data.
--debug-peek
UC. Shows reconstructor peek stack.
--show-flags
ZUCL. Shows internal flags after initialization.
--show-recon
UC. Shows the reconstruction plan.
--show-dvcf
C. When used with dual-coordinate VCF file shows for each variant its Coordinates (Primary Luft or Both) and its oStatus.
--show-wrong-MD
C. SAM/BAM with MD:Z field - shows cases where the special MD algorithm is not applied to the MD:Z in the data.
--show-wrong-XG
C. SAM/BAM with BS-Seeker2 XG:Z field - shows cases where the special XG algorithm is not applied to the XG:Z in the data.
--show-wrong-XM
C. SAM/BAM with Bismark or BSSeeker2 XM:Z field - shows cases where the special XM algorithm is not applied to the XM:Z in the data.
--show-wrong-XB
C. SAM with BSBolt XB:Z field - shows cases in which the predicted methylation string differs than the actual.
--debug-LONG
Z. SAM/BAM and FASTQ: treat data as long reads regardless of the actual read length.
--show-qual
ZUC. SAM/BAM and FASTQ: see internal data of the QUAL compression codecs.
--debug-qname
C. SAM/BAM and FASTQ: show QNAME flavor unit test.
--show-buddy
ZUC. SAM/BAM: show buddy (which can be a mate or saggy or both) for each line that has one.
Tracking compression performance
-w, --stats
Show the internal structure of a genozip file and the associated compression statistics.
-W, --STATS
Show more detailed statistics.
Note: specifying -W or -w twice, results in the header line of the statistics printed to stderr, thereby surviving piping stdout to grep
--show-filename
Show the file name for each file.
--stats=comp_i, --STATS=comp_i
Z. similar to --stats or --STATS but shows stats for a single component
--show-codec
Z. Genozip tests for the best codec when it first encounters a new type of data. See the results.
--verify-codec
ZUC. Verifies each section's decompression correctness against an Adler32 that is stored in SectionHeader.magic.
Note: the Genozip file generated when using this option is not a valid Genozip file as it has the wrong magic -
this option is designed for detecting issues while developing new codecs.
Example: genozip -t --verify-codec myfile.sam
--submit-stats
Z. Submit aggregate stats of compression performance and metadata to the server
--debug-submit
Z. Submits stats for debugging
--debug-debug
ZUCL. Ad hoc option to assist debugging
Controlling execution
--one-vb vb
UC. Reconstruct data from a single VB. Can be used with (1) genocat or (2) genounzip --test.
--seg-only
Z. Run the segmenter but don't compress and don't write the output.
--xthreads
ZUC. Use only one thread for the main PIZ/ZIP dispatcher. This doesn't affect thread use of other dispatchers.
--debug-latest
ZU. Force genozip version upgrade notice
--debug
ZUCL. Execute various debugging logic