With genomes, as with haute cuisine, less sometimes is more. Sure, researchers using
next generation DNA sequencers can
whip out several full-human-genome’s worth of sequence data in a single
run. But interpreting those data—assembling the reads, identifying
where the genome differs from the reference sequence and determining
which of those variants, if any, might underlie some interesting
biology—that’s another story.
Geneticists can make their lives
easier by concentrating their efforts on that small fraction of the
genome that encodes protein—the so-called “exome.” Representing just 1%
or so of the overall sequence, an exome is like a Reader’s Digest
condensed version of the genome—short, to the point, and less expensive
than the full-length original.
Exome sequencing represents an
“effective compromise between the competing goals of genome-wide
comprehensiveness and cost-control,” wrote University of Washington
genomicist Jay Shendure in a 2011 editorial to a special issue of Genome
Biology on the topic of exome sequencing. [1] Basically, because a
sequencer can only push out a finite number of bases, the more samples
that can be combined per run through sample “barcoding,” the less each
sample costs, and the more deeply each sample can be read.
Another
benefit of exome sequencing is interpretability. Whole-genome
sequencing generates a lot of data, but for the majority of base pairs,
researchers don’t know their relationship to disease, explains Yaping
Yang, laboratory director for the Whole Genome Laboratory at Baylor
College of Medicine, a clinical laboratory that offers an
exome-sequencing service. “Therefore, they are not helpful in making
molecular diagnoses.”
In effect, exome sequencing is just a
special form of targeted sequencing, an application in which researchers
capture specific genomic segments for sequencing analysis. The
difference is that instead of capturing, say, all the genes implicated
in a particular cellular pathway, exome sequencing selects every exon of
every protein-coding gene, and sometimes 5’ and 3’ untranslated
sequences, as well—a collection numbering in the tens of megabases.
If
PubMed is any guide, getting at those megabases has become very popular
indeed: The database lists nearly 1,300 references including the term
‘exome,’ almost all of them published since 2011. Commercial vendors now
offer tools to help researchers get in on the act. If you’ve been
thinking about spicing up your genetics work with, as Genome Biology
called it, “that special ‘exome factor’,” read on. [2] We’ll help you
identify a solution that meets your needs.
Solution-based hybridization
Commercial
options for exome capture fall into two basic categories,
solution-based hybridization and PCR. Kits for the former are available
from
Agilent Technologies,
Illumina and
Roche NimbleGen.
Clinically focused PCR-based kits can be obtained from Agilent. (Roche
NimbleGen used to offer hybridization capture on planar microarrays but
has discontinued that line.)
Solution hybridization-based
techniques all follow the same basic protocol: Mix fragmented genomic
DNA with biotinylated capture oligonucleotides, hybridize, capture
hybrids on streptavidin-conjugated microbeads, wash away the unbound
material and release what’s left. The differences lie largely in the
details.
Agilent’s SureSelect Human All Exon V5 and V5+UTRs kits
use pools of several hundred-thousand 120-mer biotinylated RNA capture
probes to enrich exonic sequences (as well as, optionally, 5’ and 3’
untranslated regions and up to 6 Mb of custom sequence).
According
to Olle Ericsson, Agilent’s marketing director for DNA Sequencing, RNA
hybrids are stronger than comparably sized DNA-DNA hybrids, leading to
stronger and more efficient capture. In addition, the system’s use of
very long oligos means that even sequences containing short insertions
and deletions (or “indels”) can be captured efficiently.
Version
5, the newest iteration of the SureSelect exome line, was launched last
fall at the American Society for Human Genetics national meeting.
According to Ericsson, V5 features updated content as well as a new,
streamlined workflow that reduces sample preparation time by about a
half-day, largely thanks to a shorter hybridization step: “If you start a
prep on day one, you will have [the samples] ready to sequence on day
two” (as opposed to the following morning).
The SureSelect system
is available in two forms. SureSelect XT enables barcoding and mixing of
samples after capture, and SureSelect XT2 barcodes (or “indexes”)
samples before capture. The choice of which to use “depends on how you
prefer to set up your workflow,” Ericsson says. To do pre-capture
indexing, all the samples must be available at the same time. If samples
tend to dribble in one or two at a time, post-capture indexing might
make more sense. (Pre-capture pooling approaches also typically perform
slightly worse than post-capture pooling, he adds.)
Also based on
solution capture is Roche NimbleGen’s SeqCap EZ Exome Library v3.0,
which uses a capture library comprising some 2.1 million DNA
oligonucleotides averaging 80 bases in length. That design, says Thomas
Albert, global head of technology innovation at Roche Applied Science,
gives the company considerable flexibility in terms of how it positions
its capture probes.
“We can put more probes in some places than in
others, or make them longer, or shift them around in different
ways—these are all things we can do because we have a larger number of
smaller probes,” Albert says.
The reason that is necessary, he
explains, is non-uniformity—variations in melting-temperature, secondary
structure and cross-hybridization efficiency across the genome such
that, though an exome might be sequenced to, say, 100-fold coverage,
some regions will be over-represented and others possibly skipped
altogether. That means crucial variants could be overlooked or
misinterpreted, leading to more sequencing and rising costs.
According
to Albert, the SeqCap EZ Exome Library v3.0 was released about a year
ago and captures some 64 Mb of genome sequence. More recently, the
company has added the ability to capture 32 Mb of 5’ and 3’ untranslated
region (UTR) sequences (SeqCap EZ Exome +UTR Library) or up to 50 Mb of
custom content (SeqCap EZ Exome Plus).
Illumina offers two
solution-capture systems, the stand-alone TruSeq™ Exome Enrichment Kit,
which captures 62 Mb of genomic sequence using more than 340,000 95-mer
probes, and the Nextera® Exome Enrichment Kit, which integrates TruSeq
into a “streamlined, automation-friendly workflow [that] combines
[enzyme-based fragmentation and] library preparation and exome
enrichment steps, and can be easily completed in 2.5 days with minimum
hands-on time,” according to product literature.
PCR-based kits
On
the PCR front, Agilent launched its PCR-based HaloPlex Exome kit in
February 2013, coinciding with the annual Advances in Genome Biology
& Technology (AGBT) 2013 conference. Aimed mostly at the clinical
market, the kit offers a simpler workflow and less input DNA (200 ng vs.
1 to 3 g) than SureSelect, says Ericsson. In particular, he says, the
HaloPlex protocol eliminates the need for mechanical shearing of the
genomic template, integrating the library-preparation step into the PCR
process itself.
Agilent also launched at AGBT a dedicated software
package called SureCall. Unlike the more flexible (and powerful)
GeneSpring software used with SureSelect, SureCall converts raw sequence
data directly into mutation lists “that are classified according to
industry guidelines,” Ericsson says.
In the clinic
Exomes
might be simpler than whole genomes, but data interpretation is still a
challenge in exome analysis, especially when medical decisions depend on
the outcome. Such is the case at the growing number of clinical
laboratories now offering exome-sequencing services.
At the Whole
Genome Laboratory at Baylor College of Medicine, turnaround time on the
lab’s exome-sequencing service is about 15 weeks, says Yang, “because
the analyses and interpretation are so complicated.” The lab must
consider a patient’s clinical presentation, prior testing and
exome-sequencing data before it can issue a report.
Baylor’s
service mainly is used to diagnose patients who are suspected of having
genetic disorders that the referring physicians cannot pin down. Most
are pediatric patients with neurological deficits, Yang says, and its
“pick-up” rate—the fraction of cases in which a likely causative genetic
mutation can be identified—ranges from 25% to 30%.
“If the
clinical phenotype is not caused by a genetic defect, no matter how hard
we try, we are not going to find mutations,” she says. And, of course,
some mutations fall outside of the exome.
Yang’s lab sequences
those exomes using three Illumina HiSeq sequencers, typically combining
three samples per lane, 48 samples per run, for about 13 or 14
gigabases, or 150-fold mean coverage, per exome on average. For exome
capture, the lab uses a custom Roche NimbleGen solution-hybridization
design named VCRome, developed at Baylor’s Human Genome Sequencing
Center. According to Yang, that system covers more than 95% of desired
based at 20-fold-coverage or higher, with a capture specificity of 70%
to 80%.
Which method to chose
On the face of it, any solution-based capture approach should work equally well for exome analysis. But do they?
To
find out, Michael Snyder, professor and chair of genetics at Stanford
University and director of the Stanford Center for Genomics and
Personalized Medicine, and his team in 2011 compared the performance of
all three commercial approaches. [3] The results, they report:
…
suggest that the Nimblegen platform, which is the only one to use
high-density overlapping baits, covers fewer genomic regions than the
other platforms but requires the least amount of sequencing to
sensitively detect small variants. Agilent and Illumina are able to
detect a greater total number of variants with additional sequencing.
Illumina captures untranslated regions, which are not targeted by the
Nimblegen and Agilent platforms. [3]
(Today, UTR coverage is an option on all three platforms.)
In short, says Snyder, “They all worked pretty well.” But his lab prefers SureSelect, he says, “because it has a nice balance.”
Snyder
says exome sequencing offers two advantages over whole-genome
sequencing, even in this age of falling whole-genome prices. First, the
lower cost means larger populations can be studied than might otherwise
be possible. “Most projects tend to be budget-driven,” he says. But
exomes also enable deeper sequencing than whole genomes—a consequence of
the fact that, again, a sequencer can only push out so many bases.
Snyder’s 2011 study routinely identified several thousand variants in
exomes that were missed in the corresponding whole-genome sequences.
“And because they’re in exomes, they tend to be things that you care
about, because they’re coding,” he says.
On the other hand,
whole-genome sequencing captures everything and can more effectively
identify structural rearrangements that exome sequencing might miss. As a
result, when dealing with clinical samples, Snyder’s lab tends to err
on the side of caution and capture both an exome and a whole genome.
“That’s to get us the extra coverage,” Snyder says. “We feel more
confident about our calls.”
References
[1] Shendure, J, “Next-generation human genetics,” Genome Biology, 12:408, 2011.
[2] Stower, H, “The exome factor,” Genome Biology, 12:407, 2011.
[3] Clark, MJ, et al., “Performance comparison of exome DNA sequencing technologies,” Nature Biotechnology, 29:908-4, 2011.