Google wants to leverage the infrastructure that runs its dominant
search engine, Google Maps and Gmail to help analyze and share big
genomic data.
Google Genomics
is being promoted as a way to store growing DNA data in the cloud.
Users could load and export genomic data for free, then pay about $25 a
year for storage and queries. Storage is calculated in terabytes per
month, queries in millions of API calls.
The search giant joins other cloud giants in the growing competition
to store the skyrocketing amount of data as more genomes are being
shared, compared and linked. The results of those comparisons are
expected to drive the emerging big genomic data sector. Google Genomics
claims to allow comparisons “in seconds with SQL-like queries.”
Ultimately, Google says it is trying to converge data science and the life sciences to spur medical research.
The field is considered ripe for innovation. As Google points out,
about 99.9 percent of human DNA is identical, but “in practice, the
files start out much bigger because you need to do a lot of analysis to
identify that zero point one percent that makes each of us unique.”
Since the human genome was first sequenced, the time and cost
involved in gene sequencing has plummeted, according to Google, to about
one day and $1,000. “With an exponential price drop like that, the
volume of sequencing has exploded,” Google Genomics’ product manager
Jonathan Bingham noted.
As the cost of DNA sequencing has dropped, data volumes have soared
into the petabytes. Bingham estimates the size of each genome represents
about 100 gigabytes of data. It is the mixing and matching of genomic
data that is driving the need for storage that is linked to data
analytics tools, cloud storage proponents insist.
Google’s pitch focuses on the capabilities of its current
infrastructure that includes a search index of 100 petabytes and search
query returns in about 0.25 seconds. The cloud provider is promising
similar results for genetics researchers “without owning a datacenter,”
Bingham asserted.
Google Genomics said during its I/O event in June it was working with the genomics community to define a standard API
for working with big genomic data sets in the cloud. More recently, it
announced it was implementing an API defined by the Global Alliance for
Genomics and Health that covers data visualization and analysis.
“We are hosting public data that is available through the API and
we’re building open-source software showing how to work with big genomic
data using that API,” explained Bingham. Google also said its approach
allows for analysis of genomic data via either interactive queries or
through massively parallel processing.
Meanwhile, the Google unit said it is also offering data analytics
tools like AppEngine, BigQuery, MapReduce and R on the Google Cloud
Platform to sift through and share genomic data,
said Bingham, who also heads Google’s efforts to merge cloud computing with life sciences.
The timing of Google Genomic appears propitious: The Global Alliance for Genomics and Health convened last month in San Diego to consider ways to “accelerate sharing of genomic and clinical data.”
Recent items:
Google Re-Imagines MapReduce, Launches DataFlow
Machine Learning Gets a Boost From Google
No comments:
Post a Comment