top of page

Diagnostics (for technical support)

Usage

 

As command line options for genozip (Z), genounzip (U), genocat (C), genols (L)

 

Note: When used with genocat most options show only the requested metadata and not the file data itself.

 

Memory consumption

 

--show-memory[=PEAK]  

ZUCL. Show which Buffers are consuming the most memory. Normally, memory is sampled at the end of comprssion or decompression. With =PEAK, each Buffer retains its maximum allocation throughout execution.

kill -USR1 pid 

ZUCL. Executes --show-memory on a running process. Not available on Windows.

--debug-memory[=bytes] 

ZUCL. Show Buffer allocations and destructions. If <bytes> is specified then show only allocations of at least <bytes>.

--show-hash  

Z. See raw numbers that feed into determining the size of the global hash tables.

genozip file contents

 

--show-data-type

C. Show the data type of a genozip file.

--show-alleles  

ZUC. (VCF only) Output allele values to stdout. Each row corresponds to a row in the VCF file. Mixed-ploidy regions are padded and 2-digit allele values are replaced by an ascii character.

--show-dict[=field]

ZUC. Show dictionaries read/written for each vblock. With optional field (use --STATS to see the field names in the file) shows only that one field.

--show-singletons=field

ZUC. Show singletons in local. 

--show-counts=field

ZUC. Show (per snip in dictionary) the number of words in the file using this snip. genozip - works for any context (use --STATS to see context names). genounzip/genocat - works only for contexts that have a SEC_COUNTS section (which include 

any contexts in a file generated with genozip --show-counts of that context).

--show-b250[=field]

ZUC. Show b250 sections content - each value shows the line (counting from 1) and the index into its dictionary (note: REF 

and ALT are compressed together as they are correlated). With optional field (eg CHROM ; RNAME ; POS ; AN etc) shows only

that one field. This also works with genounzip and genocat but without the line numbers.

--dump-b250=field

.ZUC.  Dump the binary content of the b250 data of this field exactly as they appear in the genozip format to a file named 

"field.b250" - specify the field name as it appears in the Name column in --STATS for fields that have "comp b250" data.

--dump-local=field  

ZUC.  Same as --dump-b250 just for the local buffer.

--contigs

ZUC.  List the names of the chromosomes (or contigs) included in the file. Alternative names: --chroms --list-chroms

--dump-section section-type

ZUC. Dump the uncompressed unencrypted contents of all sections of this type (as it appears in 

--show-gheaders eg SEC_REFERENCE) to a files named "section-type.vb.dict_id.[header|body]".

 

--show-headers[=section-type]

ZUC. Show all the sections headers or those of a specific section type if the optional argument is provided. Argument is a case-insesitive substring of a section name. genozip and genounzip show the headers encountered in their normal operation, 

while genocat shows all the headers in the file, in the order they are in the file. In combination with --force, magic-based scanning for headers is conducted in the file without relying on the section list - useful for truncated or corrupted genozip files. 

--recover

UC. In case of GENOZIP_HEADER section does not appear in the offset specified in the footer due to corruption - scan for the header. Used in combination with --show-headers --force or --show-gheaders --force.

 

--show-index  

ZUC. Show the content of the random access index (SEC_RANDOM_ACCESS section).

--show-lines  

Z. Show the byte offset of each line

--show-reference  

ZUC. Show the ranges included the SEC_REFERENCE sections

--show-ranges  

UC. Show the ranges as in RefStruct.ranges

--show-ref-seq  

ZUC. Show the reference sequences as stored internally in a SAM BAM or FASTQ file (also works for a reference file but 

--reference --regions is faster). Combine with --regions to see specific regions (genocat only). Combine with 

--sequential to omit newlines. '-' appears in unset loci.

--show-ref-diff  

C. Show the difference between two reference files. Use in combination with two --reference arguments.

--show-ref-index  

ZUC. Show the content of the random access index of the reference data (SEC_REF_RAND_ACC section).

--show-ref-hash  

ZUC. Show the details of the reference hash table (SEC_REF_HASH) sections.

--show-chrom2ref  

ZUC. Show the details of the file contigs that are mapped to a different contig name in the reference (eg '22' ➔ 'chr22').

--show-ref-contigs  

ZUC. Show the details of the reference contigs.

--show-ref-iupacs  

ZC. Show the the IUPACs in the reference. In combination with 

genozip --chain - also shows the VCF variants that have a IUPAC in the Luft reference and how they are handled.

--show-kraken  

C. Show inclusion or exclusion of lines. Used in combination with --taxid.

--show-txt-contigs  

ZUC. (SAM and BAM) Show the details of the contigs appearing the file header (SQ lines).

--show-gheader  

ZUC.  Show the content of the genozip header (which also includes the list of all sections in the file). 

--show-gheader=2 shows the section list after modification (if any) by writer_create_plan.

--show-vblocks[=task]

ZUC.  Show vblock information as they are read / written. Optional task limits output to a specific dispatcher task, e.g. piz.

--show-aliases  

ZUC. See contents of SEC_DICT_ID_ALIASES section.

--show-reference  

ZUC. Show the ranges included the SEC_REFERENCE sections.

--show-is-set contig

UC. Shows the contents of SEC_REF_IS_SET section for contig.

--show-bgzf  

ZUC. Show BGZF blocks as they are being compressed or decompressed.

--show-sag[=grp_i]  

ZUC. SAM/BAM: Show SA groups (supplementary / secondary alignments + their primary alignment).

--show-depn  

Z. SAM/BAM: Show supplementary / secondary alignments that are successfully mapped against a primary alignment.

--show-dvcf  

C. Show line-by-line outcome of the liftover of that line. Used with dual-coordinate files and may be combined --luft.

See: Dual-coordinate VCF files

 

Text file contents

 

--show-bam  

C. Show alignments of a BAM file.

Subsetting a file for debugging

 

--biopsy=vb[,vb...] or [MAIN]|[PRIM]|[DEPN}

Z. Dump a subset VBs of the source file being compressed and including the txt header. The argument is a comma separated list of VB numbers or VB ranges. An argument of 0 means txt header only.

For SAM/BAM only: a comma-seperated combination of MAIN, PRIM and/or DEPN may be specified. This is useful as in gencomp VB numbers might change between runs due to insertion of PRIM VBs. 

 

Examplegenozip mybam.bam --biopsy 5-7,11 will emit the txt header and VBs 5,6,7,11.

Note: The biopsy is taken after reading the VBlocks without segging.

Note: The txt header is always included, unless --no-header is specified.

--biopsy-line=vb/line  

Z. Dump a single line. vb is 1-based VBlock number and line is 0-based line within the VBlock.

--skip-segconf

Z. Intended to be used in combination with --biopsy to skip segconf (useful for taking a biopsy of defective files).

-B--vblock  

C. Use with a 'B' suffix to specify a low number of bytes eg -B100000B. Useful for then subsetting with --biopsy.

--head[=N]  

C. Compress only the first N lines (default: 10). When using this option Genozip compresses only VB=1 so vblock needs to 

be large enough to contain the specified number of lines.

--truncate-partial-last-line  

Z. In case the last line of a file is partial (e.g. missing a newline, in the case of a textual file) - it is discarded. If the file is BGZF-compressed, and the last BGZF block in the file (up to 64 KB of data) is incomplete - it is discarded. Digest is computed on the file without the discarded data.

Tracking execution

 

--show-containers[=field] or [=vblock_i]  

ZUC. Show flow of containers. Possibly with the values of a specific vblock_i or specific field (use 

--STATS to see the field names in the file).

--show-plan  

ZUC. Shows reconstruction plan. Combine with --luft to see Luft reconstruction plan.

--show-threads

ZUC.  Show thread dispatcher activity.

--debug-threads

ZUCL.  Alternative to --show-threads - store thread log in a buffer and display it in case of an error.

--debug-lines  

ZUC.  ZIP: adds an Adler32 signature to each line which will be verified in PIZ.

--add-line-numbers  

C.  SAM only: adds a field VB:Z describing the comp_i vblock_i and line_i of the line

--debug-seg[=field]  

Z.  Shows snips being segmented into contexts - possibly limiting to a specific field (use 

--STATS to see the field names in the file).

--count=VB  CU. 

Show number of lines written for each VBlock (note: --count without an argument shows lines written in the entire file).

--debug-progress

ZUC. See raw numbers that feed into the progress indicator.

--debug-stats  

Z. See details in the creation process of the --stats report.

--debug-generate  

Z. See contexts that are marked as "all the same" and are removed or shrunk.

--debug-recon-size  

Z. See vb->context[]->txt_len and vb->recon_size.

--debug-gencomp  

Z. SAM/BAM/VCF: View the queues of generated component buffers.

--debug-sag  

Z. SAM/BAM: For each failing candidate line for SA Groups - show the reason for its failure to get included.

--show-time[=res] or [=comp_i]

ZUCL. Show what functions are consuming the most time. Optional res is one of the members of ProfilerRec defined 

in profiler.h such 'compressor_lzma' or a substring such as 'compressor_'. Alternatively, optional comp_i (0-based) to show time of just one component.

--show-aligner

ZUC. SAM/BAM/FASTQ: Show alignments of reads as generated by the Genozip aligner.

--show-digest  

ZUC. Show digest (MD5 or Adler32) updates.

--log-digest  

ZUC. Output the data hashed for digest_ctx_bound to digest.zip.log and digest.piz.log

--show-mutex[=mutex-name].  

ZUC. Shows locks and unlocks of all mutexes or a particular mutex.

--debug-read-ctxs

ZUC. Show all B250 LOCAL and DICT sections as they are read/skipped and decompressed during genocat/genounzip. 

Note: For genozip this is only relevant for reading sections of the first FASTQ file when compressing the second FASTQ 

file with --pair.

--show-uncompress

ZUC. Shows uncompressing of section data.

--debug-peek

UC. Shows reconstructor peek stack.

 

--show-flags

ZUCL. Shows internal flags after initialization.

--show-recon

UC. Shows the reconstruction plan.

--show-dvcf

C. When used with dual-coordinate VCF file shows for each variant its Coordinates (Primary Luft or Both) and its oStatus.

--show-wrong-MD

C. SAM/BAM with MD:Z field - shows cases where the special MD algorithm is not applied to the MD:Z in the data.

--show-wrong-XG

C. SAM/BAM with BS-Seeker2 XG:Z field - shows cases where the special XG algorithm is not applied to the XG:Z in the data.

--show-wrong-XM

C. SAM/BAM with Bismark or BSSeeker2 XM:Z field - shows cases where the special XM algorithm is not applied to the XM:Z in the data.

--show-wrong-XB

C. SAM with BSBolt XB:Z field - shows cases in which the predicted methylation string differs than the actual.

--debug-LONG

Z. SAM/BAM and FASTQ: treat data as long reads regardless of the actual read length.

--show-qual

ZUC. SAM/BAM and FASTQ: see internal data of the QUAL compression codecs.

--debug-qname

C. SAM/BAM and FASTQ: show QNAME flavor unit test.

--show-buddy

ZUC. SAM/BAM: show buddy (which can be a mate or saggy or both) for each line that has one.

--debug-huffman

UC. Deep: show parameters of the Huffman compression of QNAMEs, used during decompression of Deep files.

--debug-split container

ZUCL. show why str_split_by_container() fails. Useful for debugging new qname flavors.

--show-segconf-has

Z. show fields encountered during segconf.

--show-deep[=qname_hash,seq_hash,qual_hash]

ZUC. Deep: show deep parameters (optionally: of a single alignment). Usually used in combination with --deep, but can also be used without --deep.

--debug-valgrind

ZUCL. Normally Genozip refrains from releasing resources if the process is about to terminate - as process termination would release the resources faster. However, if valgrind is running, or if --debug-valgrind is specified (even without valgrind running), Genozip does release all resources, to allow detection of true resource leaks.

--show-cache

ZUC. Shows the execution steps in the complex process of loading a cached reference file.

Tracking compression performance

 

-w--stats   

Show the internal structure of a genozip file and the associated compression statistics.

-W--STATS   

Show more detailed statistics.

 

Note: specifying -W or -w twice, results in the header line of the statistics printed to stderr, thereby surviving piping stdout to grep

--show-filename

Show the file name for each file.

 

--stats=comp_i, --STATS=comp_i 

Z. similar to --stats or --STATS but shows stats for a single component (comp_i is 0-based).

--show-codec  

Z. Genozip tests for the best codec when it first encounters a new type of data. See the results.

--verify-codec 

ZUC. Verifies each section's decompression correctness against an Adler32 that is stored in SectionHeader.magic. 

Note: the Genozip file generated when using this option is not a valid Genozip file as it has the wrong magic - 

this option is designed for detecting issues while developing new codecs.

 

Examplegenozip -t --verify-codec myfile.sam

 

--submit-stats  

Z. Submit aggregate stats of compression performance and metadata to the server

--debug-submit

          Z. Submits stats for debugging

--debug-debug

          ZUCL. Ad hoc option to assist debugging

Controlling execution

 

--one-vb vb  

UC. Reconstruct data from a single VB. Can be used with (1) genocat or (2) genounzip --test.

--seg-only  

Z. Run the segmenter but don't compress and don't write the output.

 

--xthreads  

ZUC. Use only one thread for the main PIZ/ZIP dispatcher. This doesn't affect thread use of other dispatchers.

--no-domqual

Z. SAM/BAM/FASTQ: don't use the DOMQUAL codec when compressing QUAL.

 

--no-pacb

Z. SAM/BAM/FASTQ: don't use the PACB codec when compressing QUAL.

--no-longr

Z. SAM/BAM/FASTQ: don't use the LONGR codec when compressing QUAL.


--force-longr

Z. SAM/BAM/FASTQ: force the LONGR codec for compressing QUAL.

--debug-latest  

ZU. Force genozip version upgrade notice

--debug  

ZUCL. Execute various debugging logic

bottom of page