Tuesday, October 6, 2015

Filter Bam File Using PySam


Question:
I will process a bam or sam file that it only contains reads with a specific type of Mutations (for example the mismatch is T on reference Genome and C on the Reads). Can someone help me please?

Answer:
 
The MD tag of an aligned read contains a string encoding all mismatches along with their position in the read. You could scan the MD tags for T's and then check if the read sequence has a C at that position.
To do this in Python, install pysam (http://code.google.com/p/pysam/) and start with:
 
 
 
import pysam

nuclRef = "T"
nuclRead = "C"

bam = pysam.Samfile("your.bam")

for read in bam.fetch():
    if read.is_unmapped:
        continue
    md = read.opt("MD")
    if nuclRef in md:
        print md, read.seq

No comments:

Post a Comment