top of page

Handling GATK’s
Unexpected base in allele bases” error

When running GATK’s HaplotypeCaller (and perhaps other commands?) on a BAM files that contains bases other than A,C,T,G (and N?), GATK throws an exception:

 

java.lang.IllegalArgumentException: Unexpected base in allele bases

 

This was observed both in GATK 3.5 and 4.1.

 

Genozip can directly filter the offending lines out of a BAM file:

 

Step 1: Compress the file with Genozip:

 

genozip myfile.bam

 

Step 2: Filter it:

 

genocat myfile.bam.genozip --bases ACGTN  # Keep only lines in which SEQ has only A,C,T,G,N

genocat myfile.bam.genozip --bases ^ACGTN # See the offending lines

genocat myfile.bam.genozip --bases ^ACGTN --count # Count the number of offending lines

 

This also works for SAM and FASTQ files.

The list of IUPAC characters can be found here: IUPAC codes

bottom of page