top of page

Genozip Premium

Objective

Genozip Premium is designed to address the business needs of organizations that send or receive genomic files to or from their clients, as well as organizations that are subject to stringent compliance requirements.

Receiving compressed files from your clients

Allowing your clients to compress their files with Genozip before sending them to you, not only saves you and your clients storage space and networking costs, it also slashes upload times for your clients and hence increases their satisfaction with your service.

Here's how it works:

Step 1: Provide your client with your license number. The license number can be found with genozip --license.

Step 2: Your client compresses their files with the --sendto option along with your license number, for example:

  genozip --sendto 345312423 mysample-R1.fq.gz mysample-R2.fq.gz --pair --reference hs37d5.fa.gz

Step 3: When receiving a file from your client, you can process it directly in your pipeline using genocat, or decompress it using genounzip.

Files compressed with --sendto are only accessibe by the Genozip Premium installation with specified license number. 

Your clients will not require a paid Genozip license to use genozip --sendto, even for commercial use - effectively, you are extending your own Genozip license to your clients.

Sending compressed files to your clients

Replacing legacy .gz or .bam / .cram compression with modern Genozip compression could save you and your clients considerable storage and networking costs, as well as slash the download time your clients are experiencing, thereby increasing customer satisfaction.

Decompressing Genozip files is always free.

With Genozip Premium, you are permitted to distrubute Genozip itself to your clients. In fact, if using the --tar option, we already take care of that - the tar file will also include a copy of the Genozip executables (adding a negligible 4 MB overhead):

   genozip --tar client.tar --reference hs37d5.fa.gz data.R1.fq.gz data.R2.fq.gz data.bam data.vcf.gz

   tar tvf client.tar 

   -rwxr-xr-x divon/divon      4226640 2024-01-22 06:28 genozip-linux-x86_64/genozip
   hrwxr-xr-x divon/divon            0 2024-01-22 06:28 genozip-linux-x86_64/genocat
   hrwxr-xr-x divon/divon            0 2024-01-22 06:28 genozip-linux-x86_64/genounzip
   hrwxr-xr-x divon/divon            0 2024-01-22 06:28 genozip-linux-x86_64/genols
   -rwxr-xr-x divon/divon  14193072850 2024-01-22 06:28 data.bam.genozip
   -rwxrwxr-x divon/divon   6266006155 2024-01-22 06:28 data.R1.fq.genozip
   -rwxrwxr-x divon/divon   9345680255 2024-01-22 06:28 data.R2.fq.genozip
   -rwxrwxr-x divon/divon    106624109 2024-01-22 06:28 data.vcf.genozip

With Genozip Premium, we provide the additional --user-message option which allows you specify a message of your choice that will be displayed to your clients when they decompress the file. Examples could be: additional information regarding this specific file ; information regarding how to access techincal support ; marketing information regarding the services you offer.

  genozip --user-message msg.txt mysample-R1.fq.gz mysample-R2.fq.gz --pair --reference hs37d5.fa.gz

In this example, msg.txt is a file that contains the message that will be baked into the compressed file. It may be of any length, any number of lines, and using any alphabet and language (technically: ASCII or UTF-8 format).

 

Another use case for the --tar option: If your business is such that your clients normally receive from you a large number of files in each delivery, Genozip allows you to package these files into a single standard tar file. Genozip produces the tar file directly as the output of the compression, removing the extra step of creating the tar file separately. For example:

   genozip --tar client-X-sequencing-delivery.tar --subdirs --reference hs37d5.fa.gz client-X

This  command will output a tar file, into which it will compress the directory client-X, including all its files and sub-directories, maintaining the directory structure. It will compress not only the FASTQ, BAM and VCF files, but all other files as well. The command line option --reference is, well, optional - and will result in significantly better compression if provided.

If running the Linux version of Genozip, the tar file will also include a copy of the Genozip executables (adding a negligible 4 MB overhead).

Security & encryption

Many organizations and research projects need to comply with security and privacy requirements in regards to their genomic data.

 

These days, news about system security breaches has become routine. It is simply not acceptable anymore to rely solely on the system defenses against intruders, and it is critical to protect the data itself too.

Genozip offers an easy way to encrypt the data while compressing it. The data is encrypted using the Advanced Encryption Standard (AES) established by the U.S. National Institute of Standards and Technology (NIST). This is the same encryption method routinely used to protect financial transactions and other sensitive data. Genozip uses the strongest version of AES - 256 bits.

 

Encryption and decryption are done by simply providing a password during compression and decompression, for example:

 

genozip --password mysecret000@ myfile.bam

genounzip --password mysecret000@ myfile.bam.genozip

Behind the scenes, Genozip uses the password to generate an AES encryption key, that is then used to encrypt the data during compression or decrypt it during decompression. The password, and the key derived from it, are not stored anywhere, and even Genozip has no way to recover them if they are lost. This means that even if someone successfully breaks into your computer, they still cannot read your data.

Providing your clients with the Genozip software

For your clients to compress files with genozip --sendto or alternatively decompress files you send them, they may install the standard Genozip software from our website. However, to improve their customer experience, you may include the Genozip software as genozip-linux-x86_64.tar with the files you send to your client. At 1.6 MB, its size is negligible so it can easily be included with every data delivery.

Extending your technical support from us to your clients

As a Premium license holder, we provide you with first-priority technical support.

Our technical support to you, extends to your clients as well - your clients may contact us for any technical issues regarding sending you files using --sendto, or decompressing files received from you.

Compression consultation and optimization

 

In preparation for adding Genozip to your production processes, we offer you, as a Genozip Premium customer, to analyze your specific file formats and optimize Genozip for your specific data. It is not uncommon that we are able to squeeze an additional 10% compression or more in this process. This is particularly recommended if you are using proprietary tools, or tools that we have not encountered before. Examples of data that might present an opporunity for optimization include non-standard BAM or VCF fields, base quality data that differs in its statistical properties from the base quality data produced by the software of the major sequencer manufacturers, and read names or FASTQ description lines that are unusual. Small pieces of your file formats that might seem insignifcant to you, might actually be contributing signficant entropy to the data.

We typicaly ask for example files that are 100K lines/reads subsets of the relevant FASTQ, BAM or VCF files.

Any improvements we make in this process, will be released in the subsequent Genozip version - we don't release custom versions.

Normally, after our analysis is complete, we delete the example files you provided. However, at your discretion, we are happy to retain these example files and include them in our internal Quality Assurance pipeline that is run nightly in our development process. This is an additional assurance step to reduce the risk of bugs being inadvertently introduced in the future, affecting Genozip's processing of your data. This is particularly adviseable if your data contains non-standard fields that will not get otherwise tested, as they are not represented in our test file library.

Source code escrow

 

Genozip Limited is a small company, while most of our customers are large companies and institutions. To insulate you from Genozip Limited's business risks, we offer to deposit our source code in 3rd party source code escrow, which will be released to you upon certain adverse events. This service is at a small additional cost.

Monthly and annual plans

 

Genozip Premium is offered as an annual or monthly plan. The monthly plan is often preferred by customers who wish to pay-as-you-go using a credit card. See pricing.

Questions? support@genozip.com

encryption
bottom of page