top of page

Compressing GFF, GVF or GTF files

At a glance

​

Compressing

​

$ genozip myfile.gff3

genozip myfile.gff3 : Done (4 seconds, GFF3 compression ratio: 6.6)

​
$ ls -lh myfile.gff3*

-rwxrwxrwx 1 divon divon 26M  Aug 2 22:48 myfile.gff3

-rwxrwxrwx 1 divon divon 3.9M Aug 2 22:49 myfile.gff3.genozip

​

Uncompressing

​

$ genounzip myfile.gff3.genozip 

​

Viewing

​

$ genocat myfile.gff3.genozip 

​

Optimizing compression

​

Optimization options are options that modify the file in ways that improve compression. --optimize is an umbrella option that activates all optimization options.

​

genozip --optimize-sort myfile.gff3.gz - Sorts ATTR subfields alphabetically.

​

genozip --optimize-Vf myfile.gff3.gz - The value of Variant_freq is rounded to 2 significant digits

​

Slicing & dicing your data with genocat
​

Here's a summary of the filtering and subsetting options available for GFF3 / GVF files. See genocat for more information.

​

Option                           Effect

--downsample        Show only one in every X lines

--regions       -r  Exclude or include certain genomic regions

--regions-file  -R  Like --regions, but list of regions is specified in a file

--grep              Show only lines containing the specified string

--grep-w        -g  Like --grep, but match whole words

--lines         -n  Show only a lines from given range of line numbers

--head              Show only a certain number of lines from the start of the file

--tail              Show only a certain number of lines from the end of the file

--no-header         Drop the GFF3 header lines

--header-only       Show only the GFF3 header lines

​

Example: display the lines containing “rs1357314184” (strings that match exactly):

​

genocat --grep-w rs1357314184 myfile.gff3.genozip

​

Example: display the lines containing “Dbxref=dbSNP_152:rs” (possibly a substring of a longer string):

​

genocat --grep Dbxref=dbSNP_152:rs myfile.gff3.genozip

​

        Example: Get positions 1000 to 2000 on contig 22
​

genocat myfile.gff3.genozip -r 22:1000-2000   

​

Supported formats & limitations
​

Genozip can compress the closely related formats GFF2, GFF3, GTF and GVF. It does not support compressing GFF3 files which include a ##FASTA section, and may also not support other tweaks of the GFF format, of which there are many. If you have GFF data which Genozip fails to compress, and you would like us to support it, please let us know! 

​

Tip: if you need to compress a file whose format isn't currently supported by Genozip, you can always use --input generic.

​

Questions? support@genozip.com

bottom of page