Monday, August 10, 2015

Optimizing Discovar

Part 1: Optimizing on physical hardware

Introduction

For those who don't know it, Discovar is a life sciences variant caller and small genome assembly code. It turns the output from sequencers into entire genomes given a reference sequence. This is computationally very expensive and I decided to take a look at it under MAP, our OpenMP and MPI profiler.

Compiling and Running Discovar

As with many life sciences codes, downloading, compiling and running the Discovar benchmark was refreshingly straightforward:
    # Build Discovar
    

    $ wget ftp://ftp.broadinstitute.org/pub/crd/Discovar/latest_source_code/LATEST_VERSION.tar.gz
    

    $ tar zxf LATEST_VERSION.tar.gz
    

    $ cd discovar-*
    

    $ ./configure
    

    $ make -j32
    # Download benchmark code
    

    $ wget ftp://ftp.broadinstitute.org/pub/crd/Benchmark/data_only.tar.gz
    

    $ tar zxf data_only.tar.gz $ sed s:Discovar:src/Discovar/ -i runme.sh
    # Run benchmark
    

    $ time ./runme.sh
The results seemed reasonable enough – the benchmark finished in 7.97 minutes with a peak mem of 5.6 GB. That would put our internal 24-core (with hyperthreading) server in the top 4 on the Broad Institute's benchmarking results page.

No comments:

Post a Comment