Wang Zheng Yuan: snpEff: Command line options

If you type the command without any arguments, it shows all available options:

Usage: snpEff [eff]    [options] genome_version [variants_file]
   or: snpEff download [options] genome_version
   or: snpEff build    [options] genome_version
   or: snpEff dump     [options] genome_version
   or: snpEff cds      [options] genome_version

There are four main 'commands': calcualte effects (eff, which is the default), build database (build), dump database (dump), test cds in database (cds).

Calculate variant effects: snpEff [eff]

If you type the command without any arguments, it shows all available options ("java -jar snpEff.jar eff"):

Usage: snpEff [eff] genome_version [variants_file]

Input file: Default is STDIN

Options:
 -a , -around            : Show N codons and amino acids around change (only in coding regions). Default is 0 codons.
 -i format               : Input format [ vcf, txt, pileup, bed ]. Default: VCF.
 -o format               : Ouput format [ txt, vcf, gatk, bed, bedAnn ]. Default: VCF.
 -interval               : Use a custom interval file (you may use this option many times)
 -chr string             : Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). Only on TXT output.
 -s,  -stats             : Name of stats file (summary). Default is 'snpEff_summary.html'
 -t                      : Use multiple threads (implies '-noStats'). Default 'off'

Sequence change filter options:
 -del                    : Analyze deletions only
 -ins                    : Analyze insertions only
 -hom                    : Analyze homozygous variants only
 -het                    : Analyze heterozygous variants only
 -minQ X, -minQuality X  : Filter out variants with quality lower than X
 -maxQ X, -maxQuality X  : Filter out variants with quality higher than X
 -minC X, -minCoverage X : Filter out variants with coverage lower than X
 -maxC X, -maxCoverage X : Filter out variants with coverage higher than X
 -nmp                    : Only MNPs (multiple nucleotide polymorphisms)
 -snp                    : Only SNPs (single nucleotide polymorphisms)

Results filter options:
 -fi bedFile                     : Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)
 -no-downstream                  : Do not show DOWNSTREAM changes
 -no-intergenic                  : Do not show INTERGENIC changes
 -no-intron                      : Do not show INTRON changes
 -no-upstream                    : Do not show UPSTREAM changes
 -no-utr                         : Do not show 5_PRIME_UTR or 3_PRIME_UTR changes

Annotations filter options:
 -canon                          : Only use canonical transcripts.
 -onlyReg                        : Only use regulation tracks.
 -onlyTr file.txt                : Only use the transcripts in this file. Format: One transcript ID per line.
 -reg name                       : Regulation track to use (this option can be used add several times).
 -treatAllAsProteinCoding bool   : If true, all transcript are treated as if they were protein conding. Default: Auto
 -ud, -upDownStreamLen           : Set upstream downstream interval length (in bases)

Generic options:
 -0                      : File positions are zero-based (same as '-inOffset 0 -outOffset 0')
 -1                      : File positions are one-based (same as '-inOffset 1 -outOffset 1')
 -c , -config            : Specify config file
 -h , -help              : Show this help and exit
 -if, -inOffset          : Offset input by a number of bases. E.g. '-inOffset 1' for one-based input files
 -of, -outOffset         : Offset output by a number of bases. E.g. '-outOffset 1' for one-based output files
 -noLog                  : Do not report usage statistics to server
 -noStats                : Do not create stats (summary) file
 -q , -quiet             : Quiet mode (do not show any messages or errors)
 -v , -verbose           : Verbose mode

Options

Option	Note
-a, -around	Show N codons and amino acids around change (only in coding regions). Default is 0 codons (i.e. by default is turned off).
-i	Input format: [ txt, vcf, pileup, bed ] vcf: Input file is in VCF format. Implies '-inOffset 1' txt: Input file is in TXT format. Implies '-inOffset 1' pileup: Input file is in PILEUP format. Implies '-inOffset 1'. WARNING: This format is deprecated. bed: Only intervals are provided (no variants). This is used when you want to know were an interval hits. Implies '-inOffset 0'
-interval	Add custom interval file. You may use this option many times to add many interval files.
-o	Output format: [ txt, vcf, bed, bedAnn ] vcf: Output in VCF format. Implies '-inOffset 1' txt: Output in TXT format. Implies '-inOffset 1' bed: Only minimal information is added to the 'Name' column. Format: "Effect_1 \| Gene_1 \| Biotype_1 ; Effect_2 \| Gene_2 \| Biotype_2 ; ... ". This is used when you want to know were an interval hits. Implies '-outOffset 0' bedAnn: Output annotation's info in BED format (as opposed to variant's info). This option will output annotations intersecting each variant, information will be added in the 'name' column. Implies '-outOffset 0'
-s, -stats	Name of stats file (summary). Default is 'snpEff_summary.html'.
-chr	Prepend 'chr' before printing a chromosome name (e.g. 'chr7' instead of '7').
-t	Use multiple threads (implies '-noStats'). If active, tries to use available cores in the computer. Default 'off'

Sequence change filter options

Option	Note
-del	Analyze deletions only (filter out insertions, SNPs and MNPs).
-hom	Analyze homozygous sequence changes only (filter out heterozygous changes).
-het	Analyze heterozygous sequence changes only (filter out homozygous changes). Note that this option may not be valid when using VCF4 files, since there might be more than two changes per line, the notion of heterozygous change is lost.
-ins	Analyze insertions only (filter out deletions, SNPs and MNPs).
-minC, -minCoverage	Filter out sequence changes with coverage lower than X.
-maxC, -maxCoverage	Filter out sequence changes with coverage higher than X.
-minQ, -minQuality	Filter out sequence changes with quality lower than X.
-maxQ, -maxQuality	Filter out sequence changes with quality higher than X.
-mnp	Analyze MNPs only (filter out insertions, deletions and SNPs).
-snp	Analyze SNPs only (filter out insertions, deletions and MNPs).

Results filter options

Option	Note
-fi {bedFile}	Only analyze changes intersecting intervals in file (you may use this option many times)
-no-downstream	Do not show DOWNSTREAM changes
-no-intergenic	Do not show INTERGENIC changes
-no-intron	Do not show INTRON changes
-no-upstream	Do not show UPSTREAM changes
-no-utr	Do not show 5_PRIME_UTR or 3_PRIME_UTR changes

Annotations filter options

Option	Note
-canon	Only annotate using "canonical" transcripts. Canonical transcripts are defined as the transcript having the longest CDS.
-treatAllAsProteinCoding {val}	If value is 'true', report all transcript as if they were conding. Default: Auto, i.e. if transcripts any marked as 'protein_coding' the set to 'false', if no transcripts are marked as 'protein_coding' then set to 'true'.
-ud, -upDownStreamLen	Set upstream downstream interval length (in bases). If set to zero or negative, then no UPSTREAM or DOWNSTREAM effects are reported.
-onlyReg	Only use regulation tracks
-reg {name}	Regulation track to use (this option can be used add several times).
-onlyTr {file.txt}	Only use the transcripts in this file. Format: One transcript ID per line.

Generic options

Option	Note
-0	Indicates that input and output positions are zero-based. Tha means the the first base in a chromosome is base number 0. This is equivalent to '-inOffset 0 outOffset 0'
-1	Indicates that input and output positions are one-based. Tha means the the first base in a chromosome is base number 1. This is equivalent to '-inOffset 1 outOffset 1'. This is the default.
-c, -config	Specifies the location of a configuration file. Default location is in current directory.
-h, -help	Print help and exit.
-if, -inOffset	Offset all position in input files by a number of bases. E.g. '-inOffset 1' for one-based input files.
-of, -outOffset	Offset all outputs by a number of bases. E.g. '-outOffset 1' for one-based outputs.
-v, -verbose	Verbose mode.
-q, -quiet	Quiet mode (do not show any messages or errors).
-noLog	Do not report usage statistics to server.

Download a database: snpEff download

Download and install a database. A list of databases is available at the download page.

Usage: snpEff download [options] genome_version

Generic options:
 -c , -config            : Specify config file
 -h , -help              : Show this help and exit
 -v , -verbose           : Verbose mode
 -noLog                  : Do not report usage statistics to server

E.g. to downlaod GRCh37.64, just run:

java -jar snpEff.jar download GRCh37.64

Build database: snpEff build

If you type the command without any arguments, it shows all available options ("java -jar snpEff.jar build"):

Usage: snpEff build [options] genome_version

Build DB options:
 -embl                   : Use Embl format.
 -genbank                : Use GenBank format.
 -gff2                   : Use GFF2 format (obsolete).
 -gff3                   : Use GFF3 format.
 -gtf22                  : Use GTF 2.2 format.
 -refseq                 : Use RefSeq table from UCSC.
 -txt                    : Use TXT format (obsolete).
 -onlyReg                : Only build regulation tracks.
 -cellType type          : Only build regulation tracks for cellType "type".

Generic options:
 -0                      : File positions are zero-based (same as '-inOffset 0 -outOffset 0')
 -1                      : File positions are one-based (same as '-inOffset 1 -outOffset 1')
 -c , -config            : Specify config file
 -h , -help              : Show this help and exit
 -if, -inOffset          : Offset input by a number of bases. E.g. '-inOffset 1' for one-based input files
 -of, -outOffset         : Offset output by a number of bases. E.g. '-outOffset 1' for one-based output files
 -noLog                  : Do not report usage statistics to server
 -q , -quiet             : Quiet mode (do not show any messages or errors)
 -v , -verbose           : Verbose mode

Option	Note
-embl	Use Embl format. It will look gene information in a file called './data/GENOME/genes.embl' which is assumed to be in EMBL format (assuming 'data_dir=./data/' in your snpEff.config file).
-genbank	Use GenBank format. It will look gene information in a file called './data/GENOME/genes.gb' which is assumed to be in GenBank format (assuming 'data_dir=./data/' in your snpEff.config file).
-gff3	Use GFF3 format. It will look gene information in a file called './data/GENOME/genes.gff' which is assumed to be in GFF3 format (assuming 'data_dir=./data/' in your snpEff.config file).
-gff2	Use GFF2 format. It will look gene information in a file called './data/GENOME/genes.gff' which is assumed to be in GFF2 format (assuming 'data_dir=./data/' in your snpEff.config file). WARNING: GFF2 format is obsolete and should not be used.
-gtf22	Use GFT 2.2 format. It will look gene information in a file called './data/GENOME/genes.gtf' which is assumed to be in GTF 2.2 format (assuming 'data_dir=./data/' in your snpEff.config file).
-refseq	Use refSeq table. It will look gene information in a file called './data/GENOME/genes.txt' which is assumed to be a RefSeq table from UCSC (assuming 'data_dir=./data/' in your snpEff.config file).

Wang Zheng Yuan

Friday, November 30, 2012

snpEff: Command line options

No comments:

Post a Comment