Friday, November 30, 2012

snpEff: Command line options


If you type the command without any arguments, it shows all available options:
Usage: snpEff [eff]    [options] genome_version [variants_file]
   or: snpEff download [options] genome_version
   or: snpEff build    [options] genome_version
   or: snpEff dump     [options] genome_version
   or: snpEff cds      [options] genome_version
    
There are four main 'commands': calcualte effects (eff, which is the default), build database (build), dump database (dump), test cds in database (cds).
Calculate variant effects: snpEff [eff]
If you type the command without any arguments, it shows all available options ("java -jar snpEff.jar eff"):
Usage: snpEff [eff] genome_version [variants_file]

Input file: Default is STDIN

Options:
 -a , -around            : Show N codons and amino acids around change (only in coding regions). Default is 0 codons.
 -i format               : Input format [ vcf, txt, pileup, bed ]. Default: VCF.
 -o format               : Ouput format [ txt, vcf, gatk, bed, bedAnn ]. Default: VCF.
 -interval               : Use a custom interval file (you may use this option many times)
 -chr string             : Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). Only on TXT output.
 -s,  -stats             : Name of stats file (summary). Default is 'snpEff_summary.html'
 -t                      : Use multiple threads (implies '-noStats'). Default 'off'

Sequence change filter options:
 -del                    : Analyze deletions only
 -ins                    : Analyze insertions only
 -hom                    : Analyze homozygous variants only
 -het                    : Analyze heterozygous variants only
 -minQ X, -minQuality X  : Filter out variants with quality lower than X
 -maxQ X, -maxQuality X  : Filter out variants with quality higher than X
 -minC X, -minCoverage X : Filter out variants with coverage lower than X
 -maxC X, -maxCoverage X : Filter out variants with coverage higher than X
 -nmp                    : Only MNPs (multiple nucleotide polymorphisms)
 -snp                    : Only SNPs (single nucleotide polymorphisms)

Results filter options:
 -fi bedFile                     : Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)
 -no-downstream                  : Do not show DOWNSTREAM changes
 -no-intergenic                  : Do not show INTERGENIC changes
 -no-intron                      : Do not show INTRON changes
 -no-upstream                    : Do not show UPSTREAM changes
 -no-utr                         : Do not show 5_PRIME_UTR or 3_PRIME_UTR changes

Annotations filter options:
 -canon                          : Only use canonical transcripts.
 -onlyReg                        : Only use regulation tracks.
 -onlyTr file.txt                : Only use the transcripts in this file. Format: One transcript ID per line.
 -reg name                       : Regulation track to use (this option can be used add several times).
 -treatAllAsProteinCoding bool   : If true, all transcript are treated as if they were protein conding. Default: Auto
 -ud, -upDownStreamLen           : Set upstream downstream interval length (in bases)

Generic options:
 -0                      : File positions are zero-based (same as '-inOffset 0 -outOffset 0')
 -1                      : File positions are one-based (same as '-inOffset 1 -outOffset 1')
 -c , -config            : Specify config file
 -h , -help              : Show this help and exit
 -if, -inOffset          : Offset input by a number of bases. E.g. '-inOffset 1' for one-based input files
 -of, -outOffset         : Offset output by a number of bases. E.g. '-outOffset 1' for one-based output files
 -noLog                  : Do not report usage statistics to server
 -noStats                : Do not create stats (summary) file
 -q , -quiet             : Quiet mode (do not show any messages or errors)
 -v , -verbose           : Verbose mode
Options
Option Note
-a, -around Show N codons and amino acids around change (only in coding regions). Default is 0 codons (i.e. by default is turned off).
-i Input format: [ txt, vcf, pileup, bed ]
  • vcf: Input file is in VCF format. Implies '-inOffset 1'
  • txt: Input file is in TXT format. Implies '-inOffset 1'
  • pileup: Input file is in PILEUP format. Implies '-inOffset 1'. WARNING: This format is deprecated.
  • bed: Only intervals are provided (no variants). This is used when you want to know were an interval hits. Implies '-inOffset 0'
-interval Add custom interval file. You may use this option many times to add many interval files.
-o Output format: [ txt, vcf, bed, bedAnn ]
  • vcf: Output in VCF format. Implies '-inOffset 1'
  • txt: Output in TXT format. Implies '-inOffset 1'
  • bed: Only minimal information is added to the 'Name' column. Format: "Effect_1 | Gene_1 | Biotype_1 ; Effect_2 | Gene_2 | Biotype_2 ; ... ". This is used when you want to know were an interval hits. Implies '-outOffset 0'
  • bedAnn: Output annotation's info in BED format (as opposed to variant's info). This option will output annotations intersecting each variant, information will be added in the 'name' column. Implies '-outOffset 0'
-s, -stats Name of stats file (summary). Default is 'snpEff_summary.html'.
-chr Prepend 'chr' before printing a chromosome name (e.g. 'chr7' instead of '7').
-t Use multiple threads (implies '-noStats'). If active, tries to use available cores in the computer. Default 'off'

Sequence change filter options
Option Note
-del Analyze deletions only (filter out insertions, SNPs and MNPs).
-hom Analyze homozygous sequence changes only (filter out heterozygous changes).
-het Analyze heterozygous sequence changes only (filter out homozygous changes). Note that this option may not be valid when using VCF4 files, since there might be more than two changes per line, the notion of heterozygous change is lost.
-ins Analyze insertions only (filter out deletions, SNPs and MNPs).
-minC, -minCoverage Filter out sequence changes with coverage lower than X.
-maxC, -maxCoverage Filter out sequence changes with coverage higher than X.
-minQ, -minQuality Filter out sequence changes with quality lower than X.
-maxQ, -maxQuality Filter out sequence changes with quality higher than X.
-mnp Analyze MNPs only (filter out insertions, deletions and SNPs).
-snp Analyze SNPs only (filter out insertions, deletions and MNPs).

Results filter options
Option Note
-fi {bedFile} Only analyze changes intersecting intervals in file (you may use this option many times)
-no-downstream Do not show DOWNSTREAM changes
-no-intergenic Do not show INTERGENIC changes
-no-intron Do not show INTRON changes
-no-upstream Do not show UPSTREAM changes
-no-utr Do not show 5_PRIME_UTR or 3_PRIME_UTR changes

Annotations filter options
Option Note
-canon Only annotate using "canonical" transcripts. Canonical transcripts are defined as the transcript having the longest CDS.
-treatAllAsProteinCoding {val} If value is 'true', report all transcript as if they were conding. Default: Auto, i.e. if transcripts any marked as 'protein_coding' the set to 'false', if no transcripts are marked as 'protein_coding' then set to 'true'.
-ud, -upDownStreamLen Set upstream downstream interval length (in bases). If set to zero or negative, then no UPSTREAM or DOWNSTREAM effects are reported.
-onlyReg Only use regulation tracks
-reg {name}Regulation track to use (this option can be used add several times).
-onlyTr {file.txt}Only use the transcripts in this file. Format: One transcript ID per line.

Generic options
Option Note
-0 Indicates that input and output positions are zero-based. Tha means the the first base in a chromosome is base number 0. This is equivalent to '-inOffset 0 outOffset 0'
-1 Indicates that input and output positions are one-based. Tha means the the first base in a chromosome is base number 1. This is equivalent to '-inOffset 1 outOffset 1'. This is the default.
-c, -config Specifies the location of a configuration file. Default location is in current directory.
-h, -help Print help and exit.
-if, -inOffset Offset all position in input files by a number of bases. E.g. '-inOffset 1' for one-based input files.
-of, -outOffset Offset all outputs by a number of bases. E.g. '-outOffset 1' for one-based outputs.
-v, -verbose Verbose mode.
-q, -quiet Quiet mode (do not show any messages or errors).
-noLog Do not report usage statistics to server.



Download a database: snpEff download
Download and install a database. A list of databases is available at the download page.
Usage: snpEff download [options] genome_version

Generic options:
 -c , -config            : Specify config file
 -h , -help              : Show this help and exit
 -v , -verbose           : Verbose mode
 -noLog                  : Do not report usage statistics to server
E.g. to downlaod GRCh37.64, just run:
java -jar snpEff.jar download GRCh37.64

Build database: snpEff build
If you type the command without any arguments, it shows all available options ("java -jar snpEff.jar build"):
Usage: snpEff build [options] genome_version

Build DB options:
 -embl                   : Use Embl format.
 -genbank                : Use GenBank format.
 -gff2                   : Use GFF2 format (obsolete).
 -gff3                   : Use GFF3 format.
 -gtf22                  : Use GTF 2.2 format.
 -refseq                 : Use RefSeq table from UCSC.
 -txt                    : Use TXT format (obsolete).
 -onlyReg                : Only build regulation tracks.
 -cellType type          : Only build regulation tracks for cellType "type".

Generic options:
 -0                      : File positions are zero-based (same as '-inOffset 0 -outOffset 0')
 -1                      : File positions are one-based (same as '-inOffset 1 -outOffset 1')
 -c , -config            : Specify config file
 -h , -help              : Show this help and exit
 -if, -inOffset          : Offset input by a number of bases. E.g. '-inOffset 1' for one-based input files
 -of, -outOffset         : Offset output by a number of bases. E.g. '-outOffset 1' for one-based output files
 -noLog                  : Do not report usage statistics to server
 -q , -quiet             : Quiet mode (do not show any messages or errors)
 -v , -verbose           : Verbose mode
Option Note
-emblUse Embl format. It will look gene information in a file called './data/GENOME/genes.embl' which is assumed to be in EMBL format (assuming 'data_dir=./data/' in your snpEff.config file).
-genbankUse GenBank format. It will look gene information in a file called './data/GENOME/genes.gb' which is assumed to be in GenBank format (assuming 'data_dir=./data/' in your snpEff.config file).
-gff3Use GFF3 format. It will look gene information in a file called './data/GENOME/genes.gff' which is assumed to be in GFF3 format (assuming 'data_dir=./data/' in your snpEff.config file).
-gff2Use GFF2 format. It will look gene information in a file called './data/GENOME/genes.gff' which is assumed to be in GFF2 format (assuming 'data_dir=./data/' in your snpEff.config file).
WARNING: GFF2 format is obsolete and should not be used.
-gtf22Use GFT 2.2 format. It will look gene information in a file called './data/GENOME/genes.gtf' which is assumed to be in GTF 2.2 format (assuming 'data_dir=./data/' in your snpEff.config file).
-refseqUse refSeq table. It will look gene information in a file called './data/GENOME/genes.txt' which is assumed to be a RefSeq table from UCSC (assuming 'data_dir=./data/' in your snpEff.config file).

No comments:

Post a Comment