Free open source tool: MendelScan
from The Genome Institute at Washington University School of Medicine
Documentation
MendelScan is a command-line program with multiple subcommands (e.g. score, rhro, and sibd). Each subcommand has a unique set of inputs and outputs. For the list of available subcommands, enter:
java -jar MendelScan.jar --help
Available Subcommands
These subcommands are currently supported:
java -jar MendelScan.jar score # Prioritize a VCF java -jar MendelScan.jar rhro # Perform RHRO analysis java -jar MendelScan.jar sibd # Perform SIBD analysisFor detailed usage information, enter the subcommand followed by -h or –help, e.g.:
java -jar MendelScan.jar score -hFor those familiar with Java, the auto-generated Javadoc documentation may be useful as well.
score: Variant Scoring and Prioritization
Thescore
command of MendelScan takes 4 inputs:- A pedigree file in [PED format][PED] that indicates the name, gender, and affectation status of the samples in the VCF. Samples in the VCF but not in the PED file will be treated as affected females.
- A VCF file that has been annotated with dbSNP information (a task that can be completed with the current dbSNP VCF file and the [joinx][] utility).
- Variant annotation information in Variant Effect Predictor (VEP) format, ideally with canonical, hgnc, Polyphen, SIFT, and Condel options.
- Gene expression for the tissue(s) of interest. This should be a one-column text file with HUGO symbols ordered according to their expression level (highest to lowest). This is optional but highly recommended; many gene expression datasets are freely available.
The output file contains each variant along with the overall and individual scores, as well as annotation, population, expression, and segregation data that were used to compute them. A VCF output option is also available; it places an similar but abbreviated information in the INFO field.
rhro: Rare Heterozygote Rule Out
Therhro
subcommand of MendelScan takes three inputs:- A pedigree file in [PED format][PED] that indicates the name, gender, and affectation status of the samples in the VCF. Samples in the VCF but not in the PED file will be treated as affected females.
- A VCF file that has been annotated with dbSNP information (a task that can be completed with the current dbSNP VCF file and the [joinx][] utility).
- A BED file of chromosome centromere coordinates (optional but recommended).
There are two output files from this command. One contains all informative variants (rare heterozygotes shared by affecteds, or variant positions with homozygous differences between affected pairs). The second output is a window of RHRO regions that are consistent with autosomal dominant inheritance given the inputs and assumptions described here.
sibd: Shared Identity-by-Descent
Thesibd
subcommand of MendelScan uses BEAGLE FastIBD
results to identify regions of maximum identity-by-descent (IBD) among
affected pairs. It requires the user to run BEAGLE FastIBD on the
sequencing data (which requires conversion of the VCF to BEAGLE format
and a “markers” file). This should be done on a per-chromosome basis.
Then, the following files should be provided as inputs to MendelScan for
each chromosome:- A pedigree file in [PED format][PED] that indicates the name, gender, and affectation status of the samples in the VCF. Samples in the VCF but not in the PED file will be treated as affected females.
- The BEAGLE markers file for the chromosome at hand, which typically includes four columns: physical position (chrom:position), map position (morgans), allele1, and allele2.
- The BEAGLE FastIBD output file (*.fibd) for the chromosome in uncompressed format. It should have five columns: sample1, sample2, index1, index2, and score. The index fields correspond to the markers file; MendelScan will convert these to genomic coordinates and print them to the output file.
Example
Included in the repository is an example data set using 1000 Genomes data. You extract that data and run the following example:$ tar -zxvf example_data.tar.gz $ cd example_data $ java -jar MendelScan.jar score variants.vcf \ --vep-file annotation.vep \ --ped-file family.ped \ --gene-file gene-expression.txt \ --output-file mendelscan.tsv \ --output-vcf mendelscan.vcf Reading input from variants.vcf Loading sample information from family.ped... 1 males, 2 cases, 1 controls Loading gene expression information from gene-expression.txt... Expression rank loaded for 38545 genes Loading VEP from annotation.vep... 11181 variants had VEP annotation Scoring variants under dominant disease model 3 samples in VCF (2 affected, 1 unaffected, 1 male) 11181 variants in VCF file 11181 matched with VEP annotation 12846 variants_common 337 variants_known 18 variants_mutation 97 variants_novel 1359 variants_rare 466 variants_uncommon
No comments:
Post a Comment