Question:
I will process a bam or sam file that it only contains reads with a
specific type of Mutations (for example the mismatch is T on reference
Genome and C on the Reads).
Can someone help me please?
Answer:
The MD tag of an aligned read contains a string encoding all mismatches
along with their position in the read. You could scan the MD tags for
T's and then check if the read sequence has a C at that position.
To do this in Python, install pysam (http://code.google.com/p/pysam/) and start with:
import pysam
nuclRef = "T"
nuclRead = "C"
bam = pysam.Samfile("your.bam")
for read in bam.fetch():
if read.is_unmapped:
continue
md = read.opt("MD")
if nuclRef in md:
print md, read.seq
No comments:
Post a Comment