top of page

Converting a 23andMe Raw Genetic File to VCF

23andMe customers can download their raw genetic data, following these instructions.

 

However, this data comes in a propietary 23andMe format.

 

The 23andMe file is called something like genome_John_Doe_v3_Full_20190101201010.zip (the exact file name format may vary).

 

Here, we explain how to convert the file to the standard VCF format.

 

Step 1: Download a reference file - any version of hg19 or GRCh37 will do, for example this one: hs37d5.fa.gz. This file is quite large: appoximately 900MB.

​

Step 2: Compress your 23andMe file with Genozip:

​

genozip genome_John_Doe_v3_Full_20190101201010.zip

​

Step 3: Convert the file to VCF

​

genocat -e hs37d5.fa.gz --vcf genome_John_Doe_v3_Full_20190101201010.genozip --output mydata.vcf.gz

​

Note: the output file (mydata.vcf.gz in the example above) is compressed into .gz format if the file name ends with .gz.

​

Limitations: Indel variants (‘DD’ ‘DI’ ‘II’) as well as uncalled sites (’–’) are discarded

 

Questions? support@genozip.com

bottom of page