top of page

Diagnostics (for technical support)

Usage

 

As command line options for genozip (Z), genounzip (U), genocat (C), genols (L)

 

Note: When used with genocat most options show only the requested metadata and not the file data itself.

 

 
Memory consumption

 

--show-memory[=PEAK]  

ZUCL. Show which Buffers are consuming the most memory. Normally, memory is sampled at the end of comprssion or decompression. With =PEAK, each Buffer retains its maximum allocation throughout execution.

kill -USR1 pid 

ZUCL. Executes --show-memory on a running process. Not available on Windows.

--debug-memory[=bytes] 

ZUCL. Show Buffer allocations and destructions. If <bytes> is specified then show only allocations of at least <bytes>.

--show-hash  

Z. See raw numbers that feed into determining the size of the global hash tables.

genozip file contents

 

-W --STATS

ZUC.  Show more detailed statistics.

Note: specifying -W or -w twice, results in the header line of the statistics printed to stderr, thereby surviving piping stdout to grep.
 

--show-data-type

C. Show the data type of a genozip file.

--show-alleles  

ZUC. VCF: Output allele values to stdout. Each row corresponds to a row in the VCF file. Mixed-ploidy regions are padded and 2-digit allele values are replaced by an ascii character.

--show-dict[=field]

ZUC. Show dictionaries read/written for each vblock. With optional field (use --STATS to see the field names in the file) shows only that one field.

--show-singletons=field

ZUC. Show singletons in local. 

--show-counts=field

ZUC. Show (per snip in dictionary) the number of words in the file using this snip. genozip - works for any context (use --STATS to see context names). genounzip/genocat - works only for contexts that have a SEC_COUNTS section (which include 

any contexts in a file generated with genozip --show-counts of that context).

--show-b250[=field]

C. Show b250 sections content. Combine with --one-vb to see a specific VB.

--dump-b250=field

ZUC.  Dump the binary content of the b250 data of this field exactly as they appear in the genozip format to a file named 

"field.b250" - specify the field name as it appears in the Name column in --STATS for fields that have "comp b250" data.

--dump-local=field  

ZUC.  Same as --dump-b250 just for the local buffer.

--contigs

ZUC.  List the names of the chromosomes (or contigs) included in the file. Alternative names: --chroms --list-chroms

--dump-section section-type or section_i

ZUC. Dump the uncompressed unencrypted contents of all sections of this type as it appears in 

--show-gheaders (eg SEC_REFERENCE) or a single section by its number as it appears in the first column of the --show-gheaders output - to a pair of files named "section-type.vb.dict_id.[header|body]".

 

--show-headers[=section-type,section-type... | field | section_i]

ZUC. Show all the sections headers or those of a specific section type (one or more) or field name or if the optional argument is provided, or a single section by its number as it appears in the first column of the --show-gheaders output. The argument is a case-insesitive substring of a section name or a case-sensitive field name. genozip and genounzip show the headers encountered in their normal operation, while genocat shows only headers (not the file itself).

 

In combination with --force, magic-based scanning for headers is conducted in the file without relying on the section list - useful for truncated or corrupted genozip files, and also for private files (compressed with --sendto). 

Normally does not show headers of a reference file being loaded, unless section-type is specified, or if used in combination with --debug-read-ctxs.

--recover

UC. In case of GENOZIP_HEADER section does not appear in the offset specified in the footer due to corruption - scan for the header. Used in combination with --show-headers --force or --show-gheaders --force.

 

--show-index  

ZUC. Show the content of the random access index (SEC_RANDOM_ACCESS section).

--show-lines  

Z. Show the byte offset of each line.

--show-txt-offsets  

C. For each VBlock, show txt file offset, length and num_lines.

--show-reference  

ZUC. Show the ranges included the SEC_REFERENCE sections

--show-ranges  

UC. Show the ranges as in RefStruct.ranges.

--show-ref-seq  

ZUC. Show the reference sequences as stored internally in a SAM BAM or FASTQ file (also works for a reference file but 

--reference --regions is faster). Combine with --regions to see specific regions (genocat only). Combine with 

--sequential to omit newlines. '-' appears in unset loci.

--show-ref-index  

ZUC. Show the content of the random access index of the reference data (SEC_REF_RAND_ACC section).

--show-ref-hash  

ZUC. Show the details of the reference hash table (SEC_REF_HASH) sections.

--show-chrom2ref  

ZUC. Show the details of the file contigs that are mapped to a different contig name in the reference (eg '22' ➔ 'chr22').

--show-ref-contigs  

ZUC. Show the details of the reference contigs.

--show-ref-iupacs  

ZC. Show the the IUPACs in the reference data.

--show-txt-contigs  

ZUC. SAM/BAM/CRAM: Show the details of the contigs appearing the file header (SQ lines).

--show-gheader  

ZUC.  Show the content of the genozip header (which also includes the list of all sections in the file). 

Note: (genocat) combine with --one-vb to show sections of a single VB.

--show-vblocks[=task]

ZUC.  Show vblock information as they are read / written. Optional task limits output to a specific dispatcher task, e.g. piz.

--show-aliases  

ZUC. See contents of SEC_DICT_ID_ALIASES section.

--show-reference  

ZUC. Show the ranges included the SEC_REFERENCE sections.

--show-is-set contig

UC. Shows the contents of SEC_REF_IS_SET section for contig.

 

--show-sag[=grp_i]  

ZUC. SAM/BAM/CRAM: Show SA groups (supplementary / secondary alignments + their primary alignment).

--show-depn  

Z. SAM/BAM/CRAM: Show supplementary / secondary alignments that are successfully mapped against a primary alignment.

--show-sec-gencomp

UC. SAM/BAM/CRAM: Show contents of SEC_GENCOMP : num_prim_lines and num_depn_lines for each MAIN VB. Note that VBs are shown in order of absorption (i.e. aligned to the order of the prim/depn lines)

 
File contents (for source files,  not genozip-compressed)

--show-bam  

Z. Show alignments of a BAM file.

--show-bai[=unsorted|sort|chunks|raw|linear]

--show-tbi[=unsorted|sort|chunks|raw|linear]

Z. Show contents of a BAI / TBI file: genozip --show-bai file.bam.bai

U. Show contents of a BAI / TBI file while it is being created: genounzip --show-bai file.bam.genozip

Optional argument:

=unsorted (genozip only): show the bins and chunks in the order they are in the BAI  / TBI file (default for genozip)

=sort (genozip only): sort bins in ascending order (chunk order within bins is unchanged). 

=chunks: display all the chunks ordered by offset (default for genounzip).

=raw (genounzip only): show chunks before splicing.

=linear (genounzip only): show the linear index.

--analyze-insertions  

Z. SAM/BAM: show statistics regarding inserted bases.
 

--show-flavor=qname

Z. show flavor of qname, or if a flavor cannot be discovered, show why.

Subsetting a file for debugging

 

--biopsy=vb[,vb...] or [MAIN]|[PRIM]|[DEPN}

Z. Dump a subset VBs of the source file being compressed and including the txt header. The argument is a comma separated list of VB numbers or VB ranges. An argument of 0 means txt header only.

For SAM/BAM only: a comma-seperated combination of MAIN, PRIM and/or DEPN may be specified. This is useful as in gencomp VB numbers might change between runs due to insertion of PRIM VBs. 

 

Examplegenozip mybam.bam --biopsy 5-7,11 will emit the txt header and VBs 5,6,7,11.

Note: The biopsy is taken after reading (and possibly modifying) the VBlocks without segging. Modification options such
--optimize, --add-line-numbers, --add-seq, --head apply.

Note: --no-gencomp is implicit unless --force-gencomp is specified.

Note: The txt header is always included, unless --no-header is specified.

--biopsy-line=vb/line  

Z. Dump a single line. vb is 1-based VBlock number and line is 0-based line within the VBlock.

Note: Modification options such as --optimize, --add-line-numbers, -add-seq, --head apply.

--biopsy-bytes=start,length  

Z. Dump a range of bytes from the txt file (after uncompressing if needed). The start and length parameters are in terms of the uncompressed textual file.

Note: Useful for getting a biopsy of an FASTQ R2 VBlock. Use genocat --show-txt-offsets to get the offset a VBlock.

Note: combine with --output to write to an output file, otherwise written to stdout.

--skip-segconf

Z. Intended to be used in combination with --biopsy to skip segconf (useful for taking a biopsy of defective files).

-B--vblock  

C. Use with a 'B' suffix to specify a low number of bytes eg -B100000B. Useful for then subsetting with --biopsy.

--head N

C. Compress only the first N lines. When using this option Genozip compresses only VB=1 so vblock needs to 

be large enough to contain the specified number of lines. Also, since it is only VB=1, no gencomp is possible.

 
Tracking execution

--show-reading-list

UC. Show list of TXT_HEADER and VB_HEADER sections that will be read by piz main loop.

--show-stack

UC. Show pushing and popping of containers on the container stack. 

--show-containers[=field] or [=vblock_i]  

ZUC. Show flow of containers. Possibly with the values of a specific vblock_i or specific field (use 

--STATS to see the field names in the file). In PIZ - show reconstructed snips of container items.

--show-snips  

UC. Show snips as they are being reconstructed.

--show-plan  

ZUC. Shows reconstruction plan. 

--show-threads

ZUC.  Show thread dispatcher activity.

--debug-threads

ZUCL.  Alternative to --show-threads - store thread log in a buffer and display it in case of an error.

--debug-lines  

ZUC.  ZIP: adds an Adler32 signature to each line which will be verified in PIZ.

--add-line-numbers  

C.  SAM only: adds a field VB:Z describing the comp_i vblock_i and line_i of the line

--add-seq  

Z.  SAM only: adds back a SEQ column that was previously removed. The bases will be all 'A', and the length will be identical to the length of the QUAL field.

--debug-seg[=field]  

Z.  Shows snips being segmented into contexts - possibly limiting to a specific field (use 

--STATS to see the field names in the file).

--debug-tar

          Z. shows the details of creating or decompessing from a tar file with --tar (TIP) or --t_offset --t_size (PIZ)

--count=VB   

Show number of lines written for each VBlock (note: --count without an argument shows lines written in the entire file).

--show-tasks

ZUC. Show progress in the execution of tasks, and report the time taken by each.

--debug-stats  

Z. See details in the creation process of the --stats report.

--debug-generate  

Z. See contexts that are marked as "all the same" and are removed or shrunk.

--debug-dyn-int  

ZUC. See dyn-int resizes.

--debug-recon-size  

Z. See vb->context[]->txt_len and vb->recon_size.

--debug-gencomp  

Z. SAM/BAM.CRAM: View the queues of generated component buffers.

--debug-sag  

Z. SAM/BAM/CRAM: For each failing candidate line for SA Groups - show the reason for its failure to get included.

--show-scan  

Z. SAM/BAM/CRAM: Show statistics from the pre-processing scan in case of sag BY_FLAG.

--show-time[=res] or [=comp_i]

ZUCL. Show what functions are consuming the most time. Optional res is one of the members of ProfilerRec defined 

in profiler.h such 'compressor_lzma' or a substring such as 'compressor_'. Alternatively, optional comp_i (0-based) to show time of just one component.

--show-aligner

ZUC. SAM/BAM/CRAM/FASTQ: Show alignments of reads as generated by the Genozip aligner.

 

--debug-aligner

ZUC. SAM/BAM/CRAM/FASTQ: Show Genozip aligner's handling of mismatches. Note: works only in genozip-debug.

--show-digest  

ZUC. Show digest (MD5 or Adler32) updates.

--log-digest  

ZUC. Output the data hashed for digest_ctx_bound to digest.zip.log and digest.piz.log

--show-mutex[=mutex-name].  

ZUC. Shows locks and unlocks of all mutexes or a particular mutex.

--debug-read-ctxs

ZUC. Show all B250, LOCAL and DICT sections as they are read/skipped and decompressed during genocat/genounzip. 

Note: For genozip this is only relevant for reading sections of the first FASTQ file when compressing the second FASTQ 

file with --pair.

--show-compress

Z. Shows compressing of B250, LOCAL and DICT section data.

--show-uncompress

ZUC. Shows uncompressing of section data.

--debug-peek

UC. Shows reconstructor peek stack.

--show-regions

C. In combination with --regions, shows the region ranges and the chregs (chromosome x region).

--show-TLEN-pred

C. SAM/BAM/CRAM: shows TLEN prediction.

--show-wrong-MD

C. SAM/BAM/CRAM: with MD:Z field - shows cases where the special MD algorithm is not applied to the MD:Z in the data.

--show-wrong-XG

C. SAM/BAM/CRAM: with BS-Seeker2 XG:Z field - shows cases where the special XG algorithm is not applied to the XG:Z in the data.

--show-wrong-XM

C. SAM/BAM/CRAM: with Bismark or BSSeeker2 XM:Z field - shows cases where the special XM algorithm is not applied to the XM:Z in the data.

--show-wrong-XB

C. SAM/BAM//CRAM with BSBolt XB:Z field: shows cases in which the predicted methylation string differs than the actual.

--debug-LONG

Z. SAM/BAM/CRAM/FASTQ: treat data as long reads regardless of the actual read length.

--show-qual

ZUC. SAM/BAM/CRAM/FASTQ: see internal data of the QUAL compression codecs.

--debug-qname

C. SAM/BAM/CRAM/FASTQ: show QNAME flavor unit test.

--show-buddy

ZUC. SAM/BAM/CRAM:: show buddy (which can be a mate or saggy or both) for each line that has one.

--show-huffman field

ZUC. show bit sequences encoding each character for Huffman in-memory compression used in gencomp / deep / bamass.

--debug-huffman

Z. show parameters of the Huffman in-memory compression used in gencomp / deep.

--debug-split container

ZUCL. show why str_split_by_container() fails. Useful for debugging new qname flavors.

--show-segconf-has

Z. show fields encountered during segconf. For FASTQ, also shows non-biological linkers detected in segconf.

--show-deep[=qname_hash,seq_hash,qual_hash | =all]

ZUC. Deep: show deep parameters. Usually used in combination with --deep, but can also be used without --deep. Optionally providing a specific hash outputs (in hex) for more information regarding lines with that hash, and all provides very detailed information.

--show-bamass[=qname_hash,seq_hash| =all]

Z. Bamass: show bamass parameters. Optionally providing a specific hash outputs (in hex) for more information regarding lines with that hash, and all provides very detailed information.

 

--debug-bai

ZU. Debug the creation or the showing of a BAI file.

--debug-valgrind

ZUCL. Normally Genozip refrains from releasing resources if the process is about to terminate - as process termination would release the resources faster. However, if valgrind is running, or if --debug-valgrind is specified (even without valgrind running), Genozip does release all resources, to allow detection of true resource leaks.

--debug-upgrade

Z. Show progress of thread checking for a new version. See also --debug-latest.

 

--debug-expiration

Z. Force probing for license expiration.

--show-cache

ZUC. Shows the execution steps in the complex process of loading a cached reference file.

 
Tracking compression performance

 

-w--stats   

Show the internal structure of a genozip file and the associated compression statistics.

-W--STATS   

Show more detailed statistics.

-v, --show-seg-summary

Seg summary statistics.

Note: specifying -W or -w twice, results in the header line of the statistics printed to stderr, thereby surviving piping stdout to grep

--show-filename

Show the file name for each file.

 

--stats=comp_i, --STATS=comp_i 

Z. similar to --stats or --STATS but shows stats for a single component (comp_i is 0-based).

--show-codec[=field]

Z. Genozip tests for the best codec when it first encounters a new type of data. See the results.

--verify-codec 

ZUC. Verifies each section's decompression correctness against an Adler32 that is stored in SectionHeader.magic. 

Note: the Genozip file generated when using this option is not a valid Genozip file as it has the wrong magic - 

this option is designed for detecting issues while developing new codecs.

 

Examplegenozip -t --verify-codec myfile.sam

 

--telemetry[=FILE]

Z. Send aggregate stats of compression performance and metadata to Genozip (the company). See Telemetry.

 

Note: Optional =FILE argument also writes the telemetry record to telemetry.json in the current directory.

--debug-debug

          ZUCL. Ad hoc option to assist debugging

 
Controlling execution

 

--one-vb vb  

UC. Reconstruct data from a single VB. Can be used with (1) genocat or (2) genounzip --test.

--seg-only  

Z. Run the segmenter but don't compress and don't write the output.

--xthreads  

ZUC. Use only one thread for the main PIZ/ZIP dispatcher. This doesn't affect thread use of other dispatchers.

--no-eval  

ZUCL. Don't allow features on evaluation basis (used for testing permissions).

--no-faf

Z. FASTA: Don't used the "Fasta-As-Fastq" method.

--no-splice

Z. FASTQ/FASTA/SAM/BAM/CRAM: Tell aligner not to consider spliced alignments.

--no-interleaved

Z. FASTQ/FASTA: Don't used the "Interleaved" method for improving the compression of interleaved files.

--no-domq, --no-pacb, --no-longr, --no-smux, --no-homp

Z. SAM/BAM/CRAM/FASTQ: disallow use of specific codecs when compressing QUAL.

--no-lzma

Z. Don't use the slow LZMA codec.

--force-domq, --force-pacb, --force-longr, --force-smux, --force-normq, --force-homp

Z. SAM/BAM/CRAM/FASTQ: force a specific codec for compressing QUAL.

--force-reread

Z. SAM/BAM/CRAM: When genozip uses the gencomp method, all DEPN lines are re-read from disk and none are cached in memory. Possible to combine with --force-gencomp.

--force-PLy

Z. VCF: force the PLy mehod for compressing FORMAT/PL.

--debug-latest  

ZU. Force genozip version upgrade notice, See also --debug-upgrade.

--debug  

ZUCL. Execute various debugging logic

GZ stuff

--show-gz  

Z. Show details about the GZ / MGZIP compression of a gz-compressed file, and all the blocks in the first 100 MB of the file. combine with --quiet see the block headers only.

--dump-gz-block=blk_i

Z. copies BGZF block number blk_i to a new file, without uncompressing it. Currently only works with BGZF files.  

--show-bgzf  

ZUC. Show BGZF blocks as they are being compressed or decompressed.

--show-gz-uncomp

Z. Show decompressions and truncations of gz data.

--show-isizes

ZUC. Show isizes and gz_digests stored in the SEC_GZ_ISIZES/DIGESTS sections, as they are written or read to/from disk.

Note: Add --quiet to cancel the header line.
Note: only works for exactable files (files for which --bgzf=exact works), non-exactable files don't have these sections.

--no-bgzf

Z. See --no-bgzf

--is-exactable

Z. See --is-exactable

--generate-il1m  

Z. Compress to IL1M format stdin to stdout. Set libdeflate compression level and XFL with --best and --fast that must be specified before --generate-il1m. 

Combine with --no-bgzf (must be before --generate-il1m) to force the 3rd IL1M block of the file to be an invalid IL1M block.

Note: uses igzip library which is different than original IL1M files.

head
show-bai
show-gz

© 2024 Genozip Limited. All rights reserved. Genozip™ is a trademark. Our technology is patent-pending. Privacy Policy.

bottom of page