OSDDLinux.net
Next Generation Sequencing (NGS) software packages
In the era of Next Generation Sequencing (NGS) technology, it is easy to
sequence whole genome, exome and transcriptome of an organism. But
there are several challenges also associated with analysis of data
produce by these technologies as high throughput data came in form of
short reads, and also containing several artifacts. We have developed
several modules for the analysis of Next Generation Sequencing (NGS)
data, generated after sequencing of whole genomes, transcriptomes and
human exomes. USAGE: assemb_anno.pl -i (Configuration file) -o (Output directory name) Example Command: ./assemb_anno.pl -i Configuration_file -o my_out -i Configuration_file -o Output Directory (i) Benchmarking of genome assembles This is a major module of GenomeABC which allows users to evaluate their assemblers. In order to use this module user should provide reference genome and contigs generated by their assemblers. This module will compare contigs and reference genome in order to evaluate performance of assemblers. In this study, BLAT is used to map contigs on reference genome. USAGE: benchmarking_new_assembled_genome.pl -c (fasta format contig file) -r (fasta format reference genome file) -o (output file name) Example Command: ./benchmarking_new_assembled_genome.pl -c contigs.fasta -r ref.fasta -o out.txt -c Sequence in FASTA format -r Reference genome file -o Output Directory (ii) Generation of artificial genome and simulated reads This module of server allows users to mutate a genome. User should upload reference genome and specify percent of nucleotide tobe mutated in reference genome. This module will randomly mutate the desired number of position (% of mutation) in reference genome. This module also allows users to generate simulated short reads (single-end or paired-end reads). This module will be useful for evaluating assemblers which assemble genomes based on similar reference genomes. USAGE: make_genome.pl -s (Genome Size (Put 5000000 for 5-Mb)) -a (A % (i.e. 25%)) -t (T % (i.e. 25%)) -g (G % (i.e. 25%)) -c (C % (i.e. 25%)) -l (Read length) -i (Insert length) -v (Coverage) -y (Type of reads) -o (Out directory) -s Size of genome shich have to be created. -a Percentage of A in the genome. -t Percentage of T in the genome. -g Percentage of G in the genome. -c Percentage of C in the genome. -l Read length. -i Insert length. -v Coverage. -y Type of reads(single end (1) or paired end (2)). -o Output directory name. (iii) Generation of mutated genome and simulated reads This module of server allows users to mutate a genome. User should upload reference genome and specify percent of nucleotide to be mutated in reference genome. This module will randomly mutate the desired number of position (% of mutation) in reference genome. This module also allows users to generate simulated short reads (single-end or paired-end reads). This module will be useful for evaluating assemblers which assemble genomes based on similar reference genomes. USAGE: make_mut_genome.pl -i (Input genome fasta file) -m (Percentage of mutation) -l (Read length) -f (Insert length) -c (Coverage) -y (Type of reads) -o (Out put file) -i Input genome file. -m Percentage of mutation. -l Read length. -f Insert length. -c Coverage. -y Type of reads(single end (1) or paired end (2)). -o Output directory name. USAGE: variation_detect.pl -i (Configuration file) -o (Output directory name) Example Command: ./variation_detect.pl Configuration_file -o my_out -i Configuration_file -o Output Directory
Software packages (.deb) for genome assembly and annotation
We have also developed
some debian (.deb) packages for whole genome asembly and annotation from
Next Generation Sequencing (NGS) data. After installing OSDDlinux, user
can download and install these .deb packages in the system.
Installation instructions
(1) Filter the raw sequencing data First step is to filter the raw sequencing reads for high quality bases from vector and adaptor contaminated reads. For this purpose, NGS-QC toolkit is integrated in the pipeline. Bioperl is required for this software to work. (2) Genome assembly of filtered data Filtered reads are further used to assemble the genome with user defined parameters (i.e. Hash lengths, K). Genome assembly results are then provided to the user for selecting the best result. Velvet and SOAPdenovo software are used at this step, for genome assembly. (3) Whole genome annotation The best genome assembly set is used further for genome annotation. Prokka and MAKER softwares have been integrated for the annotation of bacterial and fungal genomes respectively. Genome assembly set and annotated genome files are produced as output of this pipeline. Dependencies :- Several libraries of bioperl need to be installed for full functioning of Prokka and Maker softwares. The user should be aware of the dependencies of the integrated softwares. (1) First step is to filter the raw sequencing reads for high quality bases from the vector and adaptor contaminated reads. For this purpose, NGS-QC toolkit has been integrated in the pipeline. (2) BWA software has been integrated for the alignment of filtered reads to the human reference genome. (3) In the step further, SAMtool software processes the alignment files. (4) Finally, VarScan.v2.3.5 software detects the somatic variations and SNPs in the given sequencing data. User should have all these software installed to run this pipeline. Debian packages of all these softwares can be downloaded at the OSDDlinux website (http://osddlinux.osdd.net/ngs.php). Dependencies:- User should have all the mentioned softwares in the default path i.e. /gpsr/local/bin to run this pipeline. sudo dpkg -i package.deb Software would automatically get installed in the /gpsr/software/ directory and executable files can be called from /gpsr/local/bin directory. Example:- sudo dpkg -i maq.deb Installation location : /gpsr/software/ Exiculable present: /gpsr/local/bin |