Genozip telemetry service
(compression logs)
What is Telemetry?
If the Genozip Telemetry Service is enabled, then when a file is compressed with genozip, a tiny record containing aggregate statistics regarding the performance of our compression methods and associated metadata is uploaded and logged on the Genozip server.
Our log record is tiny, and contains only the aggregate compression method statistics and metadata as specified below.
We use these logs primarily to catch problems with the quality of the compression, as well as to identify specific combinations of sequencing technologies, base callers, aligners, variant callers, and study types in which Genozip is not doing as well as it should.
We also use these logs to provide technical support to you in case of technical issues.
How to enable or disable Telemetry?
if you have a paid license, you will be asked to choose during license activation whether or not you permit telemetry. If you do choose to do so, it will greatly help us improve Genozip for your specific use case. You can always switch telemetry on or off by re-activating using genozip --activate.
If telemetry is disabled, you can still send telemetry for a single compression by using genozip --telemetry.
If you are using a Student or Evaluation license, telemetry is always enabled.
Data collected
The structure of a telemetry record is illustrated by the table below (one record per file compressed). This structure may continue to evolve over time as Genozip develops.
The precise record sent to Genozip can be seen by using the genozip --telemetry=FILE. This causes the telemetry record to be dumped to telemetry.json in the current directory.
Data retention policy
Telemetry logs may be retained indefinitely, or may be deleted if no longer needed, if required to do so by law or regulations, or if requested to do so by the user. To request deletion or to receive a copy of the records submitted under your license, please email support@genozip.com.
Troubleshooting
If you received the following error when trying to compress:
LICENSE ERROR: Failed to upload a telemetry record to the Genozip server
It is because you are using Genozip Student which requires telemetry, but sending the log record failed, probably because you do not have Internet connectivity or telemetry is blocked by your organization's firewall. If this issue persists, you might want to consider switching to Genozip Research which does not mandate telemetry.
Questions? support@genozip.com
Field name | Example | Notes |
|---|---|---|
features (reference file) | ref_contigs=298 (3235006512) | Features of the file that affect --make-reference. |
fields_gain | QUAL,53.1%,15.0X; QNAME,18.1%,11.4X; SEQ,16.0%,49.9X; PNEXT,3.2%,11.5X; CIGAR,2.4%,12.1X; AS:i,1.4%,14.9X; TLEN,1.2%,19.1X; POS,1.2%,30.6X; MAPQ,1.1%,10.4X; FLAG,0.6%,30.1X; XS:i,0.6%,35.3X; TXT_HEADER,0.4%,3.4X; SA:Z,0.4%,2.0X; Other,0.2%,587.9X; RNEXT,0.1%,129.0X; XQ:i,0.0%,2.6X; RNAME,0.0%,466.9X; MD:Z,0.0%,1712.4X; NM:i,0.0%,2365.2X; BAM_BIN,0.0%,0.0X; RG:Z,0.0%,4757.6X | For each field: its name, % of the genozip file which is this field, and compression ratio of the field |
flags | best; optimize; reference=EXTERNAL ; file_i=4/12 | Flags that affect compression |
genozip_gain | 17.5 | Compression ratio of genozip vs the uncompressed source file |
hash_issues | TaOKEN,QNAME,512.0 KB,73%,SRR34514354.57574038,SRR10260032.79514335,SRR10260015.78254568,SRR10260015.71887571,SRR10260013.69705869,SRR10260015.55836671 | In rare cases in which a certain field has statistical properties that cause Genozip to run slowly - 6 example values of the field are sent for diagnosis. |
hash_issues | QNAME,,,,SRR11234134.1 1/2,SRR11234134.2 2/2,SRR11234134.3 3/2,SRR11234134.4 4/2,SRR11234134.5 5/2,SRR11234134.6 6/2 | Read names and other similar fields: in extremely rare cases in which Genozip cannot effeciently parse the string due to unsupported formatting, 6 example values are sent for diagnosis |
hash_issues | A00910:85:HYGWJDSXX:1:1101:3025:1000_1:N:0:CAACGAGAGC+GAATTGAGTG;A00910:85:HYGWJDSXX:1:1101:3025:1000 | when using --deep: the first FASTQ read name and the first BAM QNAME in the respective files. Sent for diagnosis in rare cases in which Genozip cannot make sense of the relationship between them. |
license_num | 442123256 | Genozip license of user |
programs (GFF) | Prodigal; | programs that generated the data - deduced from the data format |
programs (SAM/BAM) | MarkDuplicates;bwa; | programs that generated the data - generated from the ID and PN subfields of the @PG header lines |
programs (VCF) | VarScan2; | programs that generated the data - extracted from the VCF header lines |
qual_acgt (SAM/BAM/FASTQ) | I@A?;:>9786≐<,52――I;:986>7≐5<430/1――I;:97865/3140≐>,――I@A?>;≐:<98HG756 | the most common base quality scores corresponding to each of A,C,G,T in the sequence, in descending order of frequency. |
