Friday, November 30, 2012

GATK pipeline











NGS1.png
  1. Global config  : Set up global configuration of the pipeline.
  2. Mapping : Align short sequences to the human reference genome sequence database.
  3. Fixmate : Fixing the mate pairs information to ensure that all mate-pair information is in sync between each read and it's mate pair.
  4. Filter : Filtering for mapping, pairing, and proper paired
  5. Remove duplicate : Examines aligned records in the BAM file to locate duplicate reads and remove them.
  6. Filter low mapping quality : Filter low mapping quality reads
  7. Create intervals : Collect regions around potential indels and clusters of mismatches. Determine small suspicious intervals which are likely in need of realignment.
  8. Realignment : Run the realigner over the intervals to create a cleaned version of the BAM file.
  9. Analysis of covariates : Determine the covariates affecting base quality scores in the BAM file.
  10. Recalibration : Walking through the BAM file and rewrite the quality scores.
  11. Recalculate analysis of covariates : Determine the covariates affecting base quality scores in the realigned recalibrated BAM file for the comparison.
  12. Depth of coverage : Determine coverage summarized by mean, median, quartiles, and/or percentage of bases covered.
  13. HsMetrics : Calculates a set of Hybrid Selection specific metrics from an aligned BAM file..
  14. Cleanup : Remove all intermediate alignment and BAM files. Keep only first aligned and last realigned-recalibrated BAM files.
  15. Calling variants
  16. Generate genotype
  17. Annotation snpEff
  18. Annotation Annovar

No comments:

Post a Comment