- Global config : Set up global configuration of the pipeline.
- Mapping : Align short sequences to the human reference genome sequence database.
- Fixmate : Fixing the mate pairs information to ensure that all mate-pair information is in sync between each read and it's mate pair.
- Filter : Filtering for mapping, pairing, and proper paired
- Remove duplicate : Examines aligned records in the BAM file to locate duplicate reads and remove them.
- Filter low mapping quality : Filter low mapping quality reads
- Create intervals : Collect regions around potential indels and clusters of mismatches. Determine small suspicious intervals which are likely in need of realignment.
- Realignment : Run the realigner over the intervals to create a cleaned version of the BAM file.
- Analysis of covariates : Determine the covariates affecting base quality scores in the BAM file.
- Recalibration : Walking through the BAM file and rewrite the quality scores.
- Recalculate analysis of covariates : Determine the covariates affecting base quality scores in the realigned recalibrated BAM file for the comparison.
- Depth of coverage : Determine coverage summarized by mean, median, quartiles, and/or percentage of bases covered.
- HsMetrics : Calculates a set of Hybrid Selection specific metrics from an aligned BAM file..
- Cleanup : Remove all intermediate alignment and BAM files. Keep only first aligned and last realigned-recalibrated BAM files.
- Calling variants
- Generate genotype
- Annotation snpEff
- Annotation Annovar
Friday, November 30, 2012
GATK pipeline
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment