Compressing GFF, GVF or GTF files
At a glance
Compressing
$ genozip myfile.gff3
genozip myfile.gff3 : Done (4 seconds, GFF3 compression ratio: 6.6)
$ ls -lh myfile.gff3*
-rwxrwxrwx 1 divon divon 26M Aug 2 22:48 myfile.gff3
-rwxrwxrwx 1 divon divon 3.9M Aug 2 22:49 myfile.gff3.genozip
Uncompressing
$ genounzip myfile.gff3.genozip
Viewing
$ genocat myfile.gff3.genozip
Optimizing compression
Optimization options are options that modify the file in ways that improve compression. --optimize is an umbrella option that activates all optimization options.
genozip --optimize-sort myfile.gff3.gz - Sorts ATTR subfields alphabetically.
genozip --optimize-Vf myfile.gff3.gz - The value of Variant_freq is rounded to 2 significant digits
Slicing & dicing your data with genocat
Here's a summary of the filtering and subsetting options available for GFF3 / GVF files. See genocat for more information.
Option Effect
--downsample Show only one in every X lines
--regions -r Exclude or include certain genomic regions
--regions-file -R Like --regions, but list of regions is specified in a file
--grep Show only lines containing the specified string
--grep-w -g Like --grep, but match whole words
--lines -n Show only a lines from given range of line numbers
--head Show only a certain number of lines from the start of the file
--tail Show only a certain number of lines from the end of the file
--no-header Drop the GFF3 header lines
--header-only Show only the GFF3 header lines
Example: display the lines containing “rs1357314184” (strings that match exactly):
genocat --grep-w rs1357314184 myfile.gff3.genozip
Example: display the lines containing “Dbxref=dbSNP_152:rs” (possibly a substring of a longer string):
genocat --grep Dbxref=dbSNP_152:rs myfile.gff3.genozip
Example: Get positions 1000 to 2000 on contig 22
genocat myfile.gff3.genozip -r 22:1000-2000
Supported formats & limitations
Genozip can compress the closely related formats GFF2, GFF3, GTF and GVF. It does not support compressing GFF3 files which include a ##FASTA section, and may also not support other tweaks of the GFF format, of which there are many. If you have GFF data which Genozip fails to compress, and you would like us to support it, please let us know!
Tip: if you need to compress a file whose format isn't currently supported by Genozip, you can always use --input generic.
Questions? support@genozip.com