Wang Zheng Yuan: Tool for vcf annotations: VaRank from a French group

VaRank introduction

VaRank is a simple and powerful tool designed for variant ranking from next generation sequencing data. It provides a comprehensive workflow for annotating and ranking SNVs and indels.
If you are interested in Structural Variation, which also play a key role in human diseases, please go to the dbSTAR project homepage.

Four modules create the strength of this workflow:
- Variant call quality summary (total and variant depth of coverage, phred like information), to filter out false positive calls.
- Alamut Batch or SnpEff variant annotations, to integrate genetic and predictive information (functional impact, putative effects in the protein coding regions, population frequency...) from different sources, using HGVS nomenclature.
- Barcode representing the presence/absence of variants (with homozygote/heterozygote status), to search for recurrence between families or group of individuals.
- Prioritization score, to rank variants according to their predicted pathogenic status.

VaRank results aims at reducing the daily work of clinical geneticists and molecular biologists and will help to accelerate the progress in identifying disease causing variants.

VaRank requirements

a- You will need VaRank sources. The Source code is available here under the GNU GPL licence.

b- VaRank can run on any architecture with a standard Tcl/Tk installation. You can freely download it here for any architecture (e.g. AIX, Linux, Mac OS X, Solaris and Windows).
c- VaRank relies on 2 possible annotation engines to extract most of the data and offers the ability to score each variant:

Alamut Batch (Interactive Biosoftware). You can request a free, 30-day trial of Alamut Batch here.

Optional:
d- PolyPhen-2 provides prediction of functional effects of human SNPs. Depending on the annotation engine PPH2 either needs to be installed separately (Alamut Batch) or is already integrated (SnpEff). Nevertheless one can still have SnpEff installed and a local installation of PPH2.
You can freely download it here

Input data

VaRank supports the commonly used VCF (Variant Call Format) input format for variants analysis that allows the program to be easily integrated into NGS bioinformatics analysis pipelines.

Output data

VaRank provides 4 tsv output files (TAB separated values files) divided into 2 categories:

Files named with “ByVar” contains variations sorted from the most to the least pathogenic (according to the VaRank score)
Files named with “ByGene” contains variations classified by gene (“ByGene”) where the list is sorted using the gene as a proxy to the score.
Each gene is scored according to most pathogenic variant (homozygous) or the first two most pathogenic variants.
In order to make sure that no variants are missed all gene variations are reported also below the variant(s) used to score the gene.
This file is more suitable when dealing with a recessive mode of inheritance.

A part from these 2 categories, each file is also available in 2 versions:

Raw file (“allVariants”) with no variants filtered out.
Already prefiltered files (“filteredVariants”) with variants filtered out (see VaRank initial filtering).

The description of the VaRank annotation columns is available in section 7 (“ANNOTATION COLUMNS”) of the README.VaRank_*.pdf.

VaRank initial filtering

The default filters remove variants:

with a total depth of coverage <= 10x
with a supporting reads count <= 10x
with a percent of supporting reads <= 15%
with validated annotation in the dbSNP database (i.e. at least with 2 evidences) that are not pathogenic (from the ClinicalSignificance field in dbSNP)
with an allele frequency > 1% (extracted from the dbSNP database or the Exome Variant Server)

SnpEff (http://snpeff.sourceforge.net).

http://www.lbgi.fr/VaRank/