Reference | Andy Rimmer, Hang Phan, Iain Mathieson,
Zamin Iqbal, Stephen R. F. Twigg, WGS500 Consortium, Andrew O. M.
Wilkie, Gil McVean, Gerton Lunter. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nature Genetics (2014) doi:10.1038/ng.3036 |
Quick links | Download the latest stable version of Platypus User forum and update notifications (Google group) Examples of how to run Platypus in different settings Frequently Asked Questions. Full documentation |
Description |
Platypus is a tool designed for efficient and
accurate variant-detection in high-throughput sequencing data. By using
local realignment of reads and local assembly it achieves both high
sensitivity and high specificity. Platypus can detect SNPs, MNPs, short
indels, replacements and (using the assembly option) deletions up to
several kb. It has been extensively tested on whole-genome, exon-capture, and targeted capture data, it has been run on very large datasets as part of the Thousand Genomes and WGS500 projects, and is being used in clinical sequencing trials in the Mainstreaming Cancer Genetics programme. Platypus has been thoroughly tested on data mapped with Stampy and BWA.
It has not been tested with other mappers, but it should behave well.
Platypus has been used to detect variants inhHuman, mouse, rat and
chimpanzee samples, amongst others, and it should perform well on data
from any diploid organism. It has also been used to find somatic mutations in cancer, and mozaic mutations in human exome data. |
Capabilities | Platypus reads data from BAM files, and outputs a single VCF file containing a list of identified variants, and genotype calls and likelihoods for all samples. It can identify SNPs, MNPs and short (less than one read length) indels, and larger (up to several kb deletions and maybe 200bp insertions) variants using local assembly. Platypus can process large amounts of BAM data very efficiently, and can handle samples spread across multiple BAM files. Duplicate read marking, local re-alignment, and variant identification and filtering are performed on-the-fly using a single command. Platypus will run on any input data in BAM format, but has only been properly tested on Illumina data. |
Dependencies |
Platypus is written in Python, Cython and C.
It requires only Python (>=2.6) and a C compiler to build; these are
standard on most linux and Mac OS distributions, and Platypus should
build and run without problems for most people. |
Building Platypus |
To build Platypus, simply un-pack the tar-ball and run the buildPlatypus.sh script provided: tar -xvzf Platypus_x.x.x.tgz cd Platypus_x.x.x ./buildPlatypus.sh This will take a minute or so, and generate quite a lot of warnings. If the build is successful, you will see a message, 'Finished building Platypus'. Platypus is then ready for variant-calling. |
Running Platypus |
Platypus can be run from the command-line, using Python. It needs 1
or more BAM input files, and a FASTA reference file. The BAM file(s)
must be indexed using Samtools or an equivalent program, and the FASTA file must also be indexed using 'samtools faidx' or equivalent. The simplest way to tun Platypus is as follows: python Platypus.py callVariants --bamFiles=input.bam --refFile=ref.fa --output=VariantCalls.vcf The output will be a single VCF file containing all the variants that Platypus identified, and a 'log.txt' file, containing log information. The last line in the log file, and on the command-line output, should be 'Finished variant calling'. This means that the calling has completed without major errors. It is a good idea to also check the log output for warnings or errors. |
Contact: | Bug reports, comments, and feature requests (positive feedback also greatly appreciated) can be sent to platypus-users@googlegroups.com |
Tuesday, October 13, 2015
Platypus: A Haplotype-Based Variant Caller For Next Generation Sequence Data
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment