Wednesday, January 28, 2015

GATK to detect strand bias


-A StrandBiasBySample: It directly outputs counts of read depth per allele (both ref and alt) for each strand orientation. 
Strand bias is a type of sequencing bias in which one DNA strand is favored over the other, which can result in incorrect evaluation of the amount of evidence observed for one allele vs. the other. The StrandBiasBySample annotation is produces read counts per allele and per strand that are used by other annotation modules (FisherStrand and StrandOddsRatio) to estimate strand bias using statistical approaches.
This annotation produces 4 values, corresponding to the number of reads that support the following (in that order):
  • the reference allele on the forward strand
  • the reference allele on the reverse strand
  • the alternate allele on the forward strand
  • the alternate allele on the reverse strand

Example

GT:AD:GQ:PL:SB  0/1:53,51:99:1758,0,1835:23,30,33,18
 
In this example, the reference allele is supported by 23 forward reads and 30 reverse reads, the alternate allele is supported by 33 forward reads and 18 reverse reads.

Command line example

java \
    -Xmx${MEM}                                                         \
    -Djava.io.tmpdir=${JAVA_TMPDIR}                                    \
    -jar ${GATK}                                                       \
    -T GenotypeGVCFs                                                   \
    -R ${REF_SEQ}                                                      \
    -A Coverage                                                        \
    -A FisherStrand                                                    \
    -A StrandBiasBySample                                              \
    -D $SNP_DBSNP                                                      \
    -o ${SMPL_NAME}.vcf                                                \
    -nt $PROCS                                                         \
    -V samples.vcf.list
 
 
 
Annotation about VCF INFO column:
MLEAC: maximum likelihood expectation of allele count
MLEAF: maximum likelihood expectation of allele frequency
##FORMAT=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed, for this pool">                                                                
##FORMAT=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed, for this pool">   

No comments:

Post a Comment