Sunday, September 27, 2015

Success Stories in Personalized Medicine

This is the forth article in a series from the insideHPC Guide to Genomics that looks at the benefits HPC brings to Genomics as well as many success stories.
Success Stories in Personalized Medicine
Neuroblastoma and Medulloblastoma Translational Research Consortium (NMTRC) is a group of 18 universities and children’s hospitals headquartered at the Helen DeVos Children’s Hospital in Grand Rapids, Michigan. The group offers a nationwide network of childhood cancer clinical trials. These trials are based on the research from a group of collaborative investigators that are linked with laboratory programs and developing novel therapies for high-risk neuroblastoma and medulloblastoma.
“Working with partners like Dell, TGen and NMTRC, we’re seeing an entirely new reality in patient care, starting with clinical trials,” says Giselle Sholler, MD MSC chair, Neuroblastoma and Medulloblastoma Translational Research Consortium and endowed director of the Haworth Innovative Therapeutics Clinic at Helen DeVos Children’s Hospital.
“In this new model, information technology is the bridge that connects all the clinical disciplines for truly personalized patient care. It’s a team-based approach that includes bioinformatics, genomics, oncology and pharma in a way that really delivers on the promise of improved outcomes in the lives of children that participate in our studies.” Giselle Sholler, MD MSC chair, NMTRC
A leading research clinic studies a wide range of areas, including biomedical engineering, cancer biology, cellular and molecular medicine, genomic medicine, immunology, molecular cardiology, molecular genetics, neurosciences, ophthalmic research, pathobiology, stem cell biology and regenerative medicine, and oncology research. This research institute first discovered that storage was needed beyond a common desktop computer, and thus added a petabyte storage system. The next step was to implement a computer system that could respond to the most demanding computational problems. The customer turned to Dell and Intel® to provide a solution that consisted of the latest Intel Xeon processors in Dell PowerEdge™ servers, which provided multiple teraflops of performance with many terabytes of high-performance storage. Software tools included CentOS Linux, Bright Cluster Manager®, OpenMPI library, GNU Compiler Collection (GCC), Simple Linux Utility for Resource Management (SLURM), Intel Solutions for Lustre® software and the Intel Math Kernel Library (Intel MKL). This organization was able to scale their infrastructure and translate the clinical needs to actionable workflows that help patients.
The results of using the Dell cluster were very positive. Initial run time for a methylation status analysis was reduced from 20 hours to four hours. False discovery rate calculation was reduced from one week to 15 hours.
Multiple runs were now made in a week, as compared with months. Other examples exist of reducing run times from weeks or days to hours. Multiple runs are now possible, which generate the data needed for correlation with cancer types.
In addition to genomic analysis, other science domains use the Dell cluster to enhance their own research and patient records. This includes natural language processing and free text patient notes. Physicians’ hand-written notes are now able to be scanned and converted into text and made part of the electronic health record. Techniques typically associated with structural mechanics such as finite element analysis are being used on the Dell cluster to perform volume simulations on bones, and simulate passive flexion of the knee joint. Run times are reduced from 20 hours to one hour and by 75 percent in another case. Thousands of simulations that were not able to be run previously can now be run.
TGen
TGen helps fight cancer and other diseases through the use of genomics. TGen realized that speed and precision are key to a patient’s survival. To achieve this speed, they found they needed high-performance computing (HPC) to quickly run very complex algorithms. Terabytes of genetic and molecular data are available from patient and research databases. Custom treatments are needed based on the patient’s genome and other biological information.
In order to improve the turnaround time for genomic analyses and create a more customized treatment plan, TGen turned to Dell to deploy an HPC cluster, which would accelerate the time to get results. The Dell GDAP solution consists of Dell PowerEdge servers with Intel Xeon processors, storage arrays and management software.
Time is critical when diagnosing and creating a customized treatment plan. With the Dell DGAP, the time needed for genetic sequencing has been reduced considerably, as well as the analytical processes that facilitate custom treatment from seven days to four hours.
“I’m not aware of any other solution on the market that’s like the Dell Genomic Data Analysis Platform. It’s optimized for genomic workflows out of the box, and within a few days, you can install, configure and launch it into production,” says James Lowey, vice president of technology, Translational Genomics Research Institute. “Today, we help save more lives because researchers spend less time waiting for HPC resources. And it’s also easy for us to scale and customize our Dell Genomic Data Analysis Platform to support our unique requirements.”
Center For Rare Childhood Diseases (C4RCD)
The TGen Center for Rare Childhood Disorders (C4RCD) harnesses the latest technologic leaps in genome sequencing to pinpoint the causes of rare childhood disorders that largely remain a mystery to modern medicine.
“By using the Dell GDAP platform, C4RCD is able to process genetic samples quickly,” said James Lowey, TGen Vice President of Technology. “This is important as many of the families of these children have been on a diagnostic odyssey, often going years without a clear answer about what is causing the condition of their child. By taking advantage of a system designed to process NGS data, researchers can focus on exploration and discovery, instead of IT infrastructure.”
Next week we will look at a few integrated genomic processing infrastructure systems. If you prefer you can download the complete insideHPC Guide to Genomics in PDF form by clicking here, courtesy of Dell and Intel.

Friday, September 18, 2015

The Cloud Foundry Foundation: The Key Driver Of A Breakthrough In PaaS Adoption

The rise of the DevOps role in the enterprise and the increasing requirements of agility beyond infrastructure and applications make the platform-as-a-service (PaaS) market one to watch for both CIOs and enterprise architecture professionals. On December 9, the membership of Cloud Foundry, a major PaaS open source project, announced the formation of the Cloud Foundry Foundation.
In my view, this is as important as the establishment of OpenStack foundation in 2012, which was a game-changing move for the cloud industry. Here’s why:
  • PaaS is becoming an important alternative to middleware stacks. Forrester defines PaaS as a complete application platform for multitenant cloud environments that includes development tools, runtime, and administration and management tools and services. (See our Forrester Wave evaluation for more detail on the space and its vendors.) In the cloud era, it’s a transformational alternative to established middleware stacks for the development, deployment, and administration of custom applications in a modern application platform, serving as a strategic layer between infrastructure-as-a-service (IaaS) and software-as-a-service (SaaS) with innovative tools.
  • Cloud Foundry is one major open source PaaS software. Cloud Foundry as a technology was designed and architected by Derek Collison and built in the Ruby and Go programming languages by Derek and Vadim Spivak (wiki is wrong!). VMware released it as open source in 2011 after Derek joined the company. Early adopters of Cloud Foundry include large multinationals like Verizon, SAP, NTT, and SAS, as well as Chinese Internet giants like Baidu.
  • The community is gaining momentum . . .The past year has seen a 36% increase in community contributions and more than 1,700 pull requests. Community contributions are extremely important to accelerate the maturation of the software; the latest update includes Docker support for Diego to replace Droplet Execution Agents for orchestrating the placement of newly started apps. In addition to Pivotal CF, IBM has integrated Cloud Foundry into its Bluemix offering; HP has also made it part of its Helion portfolio.
  • . . . and will reassure enterprises about the value of PaaS.The formalization of Cloud Foundry will provide additional reassurance to enterprises looking to accelerate app development and increase agility in the middleware layer. It will implement Dojo as the new approach to open source development, which offers developers a unique “fast track” for commit rights. Platinum members include EMC, HP, IBM, Intel, Pivotal, SAP, and VMware; gold members include Accenture, Capgemini, Hortonworks, NTT, SAS, and Swisscom. Anchora (MoPaaS) is the only Chinese company within the group of silver members.
There are alternatives, including Red Hat OpenShift and Microsoft Azure, for customers in the Chinese market; see my PaaS market dynamics reportfor details. PaaS was never meant to be a silver bullet for all agility-oriented technology management issues; as I blogged previously, you can also consider future Docker/container-based IaaS+DevOps solutions. Which one will you bet your future success on?

Tuesday, September 15, 2015

The Gene For Sweet: Why We Don’t All Taste Sugar The Same Way

“It now pays to get a lot of pleasure out of a little bit of sugar,” says Danielle Reed, a scientist at the Monell Chemical Senses Center.
Ryan Kellman/NPR
Sugar gives the human brain much pleasure. But not everyone revels in cupcakes with an inch of frosting, or milkshakes blended with candy bars, though these crazily sugary treats are increasingly the norm.
Scientists have known for a decade that cats and other felines don’t have taste buds for sweetness at all. So they figured there had to be some genetic variation in other species, including us. Lately, they’ve discovered that some of us have genes that make us more sensitive to bitter compounds. And that suggests there might be differences in how the other four tastes — sweet, sour, salt and umami — are genetically wired.
Danielle Reed at the Monell Chemical Senses Center, and a team of fellow sensory scientists, decided to study perception of sweetness in identical and fraternal twins and compare them with non-twin siblings and unpaired twins. Twins are handy for studying genetic factors, since identical twins share almost all their genes and fraternal twins share about half. The study appears this month in the journal Twin Research and Human Genetics.
The researchers gave the twins and the other subjects two natural sugars (glucose and fructose) and two artificial sweeteners (aspartame and NHDC) and then asked them to rate the perceived intensity of the solution.
They found that genetic factors account for about 30 percent of the variance in sweet taste perception between people for both the natural and artificial sugars. (They ruled out environmental factors as having much effect on sweetness perception.) And they concluded that the genetic effect they found must have to do with a single set of genes.
“We don’t know the details of which genes in which variation are important, but we know that genetics are a piece of the larger puzzle,” Reed tells The Salt.
The finding doesn’t mean that the people who have a weaker ability to taste sweet necessarily dislike sugar. And just because you don’t get a big high from a little sugar doesn’t mean you eat more of it. “How you perceive [sweet] may influence what you like in the extreme, but it’s more like shades of gray,” she says. “And we still need to see whether this has any implications for people’s food behavior.”
That’s hard to study, Reed says, because researchers can rarely get an accurate picture of what any one person eats every day, especially if they’re relying on people to write it down and report it.
As for why this variation in taste may exist among us humans? Reed says it may have to do with the fact that humans evolved in so many different geographies with different available foods.
“If [your ancestors] were from a salt-abundant geography, like near the ocean, then maybe they got plenty of salt, so they didn’t need to be sensitive to it,” she says. “But if they were from a place with a lot of poisonous plants, maybe they needed to be more sensitive to bitter.”
And, she adds, in this day and age, it might benefit you to be more sensitive to sugar, since it’s present in excess in modern diets and has become a health risk. “It now pays to get a lot of pleasure out of a little bit of sugar,” she says.
Next, Reed wants to find out if the same variation occurs in people’s perception of sour and salt. So in August she’s heading to the Twins Days Festival in Twinsburg, Ohio, where she’s been going since 2002, to run more taste experiments on twins.

Monday, September 14, 2015

Apache Spark: 3 Real-World Use Cases


The Hadoop processing engine Spark has risen to become one of the hottest big data technologies in a short amount of time. And while Spark has been a Top-Level Project at the Apache Software Foundation for barely a week, the technology has already proven itself in the production systems of early adopters, including Conviva, ClearStory, and Yahoo.
Spark is an open source alternative to MapReduce designed to make it easier to build and run fast and sophisticated applications on Hadoop. Spark comes with a library of machine learning (ML) and graph algorithms, and also supports real-time streaming and SQL apps, via Spark Streaming and Shark, respectively. Spark apps can be written in Java, Scala, or Python, and have been clocked running 10 to 100 times faster than equivalent MapReduce apps.
Matei Zaharia, the creator of Spark and CTO of commercial Spark developer Databricks, shared his views on the Spark phenomena, as well as several real-world use cases, during his presentation at the recent Strata conference in Santa Clara, California.
Since its introduction in 2010, Spark has caught on very quickly, and is now one of the most active open source Hadoop projects–if not the most active. “In the past year Spark has actually overtaken Hadoop MapReduce and every other engine we’re aware of in terms of the number of people contributing to it,” Zaharia says. “It’s an interesting thing. There hasn’t been as much noise about it commercially, but the actual developer community votes with its feet and people are actually getting things done and working with the project.”
Zaharia argues that Spark is catching on so quickly because of two factors: speed and sophistication. “Achieving the best speed and the best sophistication have usually required separate non-commodity tools that don’t run on these commodity clusters. [They’re] often proprietary and quite expensive,” says Zaharia, a 5th year Ph.D. candidate who is also an assistant professor of computer science at MIT.
Up to this point, only large companies, such as Google, have had the skills and resources to make the best use of big and fast data. “There are many examples…where anybody can, for instance, crawl the Web or collect these public data sets, but only a few companies, such as Google, have come up with sophisticated algorithms to gain the most value out of it,” Zaharia says.
Spark was “designed to address this problem,” he says. “Spark brings the top-end data analytics, the same performance level and sophistication that you get with these expensive systems, to commodity Hadoop cluster. It runs in the same cluster to let you do more with your data.”
Spark at Yahoo
It may seem that Spark is just popping onto the scene, but it’s been utilized for some time in production systems. Here are three early adopters of Spark, as told by Zaharia at Strata:
Yahoo has two Spark projects in the works, one for personalizing news pages for Web visitors and another for running analytics for advertising. For news personalization, the company uses ML algorithms running on Spark to figure out what individual users are interested in, and also to categorize news stories as they arise to figure out what types of users would be interested in reading them.
“When you do personalization, you need to react fast to what the user is doing and the events happening in the outside world,” Zaharia says. “If you look at Yahoo’s home page, which news items are you going to show? You need to learn something about each  news item as it comes in to see what users may like it. And you need to learn something about users as they click around to figure out that they’re interest in a topic.”
To do this, Yahoo (a major contributor to Apache Spark) wrote a Spark ML algorithm 120 lines of Scala. (Previously, its ML algorithm for news personalization was written in 15,000 lines of C++.) With just 30 minutes of training on a large, hundred million record data set, the Scala ML algorithm was ready for business.
Yahoo’s second use case shows off Hive on Spark (Shark’s) interactive capability. The Web giant wanted to use existing BI tools to view and query their advertising analytic data collected in Hadoop. “The advantage of this is Shark uses the standard Hive server API, so any tool that plugs into Hive, like Tableau, automatically works with Shark,” Zaharia says. “And as a result they were able to achieve this and can actually query their ad visit data interactively.”
Spark at Conviva and ClearStory
Another early Spark adopter is Conviva, one of the largest streaming video companies on the Internet, with about 4 billion video feeds per month (second only to YouTube). As you can imagine, such an operation requires pretty sophisticated behind-the-scenes technology to ensure a high quality of service. As it turns out, it’s using Spark to help deliver that QoS by avoiding dreaded screen buffering.
In the early days of the Internet, screen buffering was a fact of life. But in today’s superfast 4G- and fiber-connected world, people’s expectations for video quality have soared, while at the same time their tolerance for video delays has plummeted.
Enter Spark. “Conviva uses Spark Streaming to learn network conditions in real time,” Zaharia says. “They feed [this information] directly into the video player, say the Flash player on your laptop, to optimize the speeds. This system has been running in production over six months to manage live video traffic.” (You can read more about Conviva’s use of Hadoop, Hive, MapReduce, and Spark here.)
Spark are also getting some work at ClearStory, a developer of data analytics software that specializes in data harmonization and helping users blend internal and external data. ClearStory needed a way to help business users merge their internal data sources with external sources, such as social media traffic and public data feeds, without requiring complex data modeling.
ClearStory was one of Databricks first customers, and today relies on the Spark technology as one of the core underpinnings of its interactive, real-time product. “Honestly if it weren’t for Spark we would have very likely built something like this ourselves,” ClearStory founder Vaibhav Nivargi says in an interview with Databricks co-founder Reynold Xin.
“Spark has notion of resident distributed data sets which are these in-memory units of data that can span across multiple machines in a cluster,” Nivargi says in the video. “As a computing unit of data that is really promising for the kinds of workloads we see at ClearStory.”

Video on YouTube: 10-min talk by Matei


Related Items:
Spark Graduates Apache Incubator
Rethinking Real-Time Hadoop
Databricks Partners with Cloudera for Analytics

Friday, September 11, 2015

Simple port forwarding using iptables

 
In case someone else is looking for a way that actually works. Though @HorsePunchKid is right in his suggestion, I've found this walkthrough that fills in the missing steps:
http://www.debuntu.org/how-to-redirecting-network-traffic-to-a-new-ip-using-iptables/
In essence:
Enable IP Forwarding:
sysctl net.ipv4.ip_forward=1
Add your forwarding rule:
iptables -t nat -A PREROUTING -p tcp -d 10.0.0.132 --dport 29418 -j DNAT --to-destination 10.0.0.133: 29418
Ask IPtables to Masquerade:
iptables -t nat -A POSTROUTING -j MASQUERADE
And that's it! It worked for me in any case :)

VNC server script for start|stop

DISPLAY="1"

# Color depth (between 8 and 32)
DEPTH="16"

# The Desktop geometry to use.
#GEOMETRY="<WIDTH>x<HEIGHT>"
#GEOMETRY="800x600"
GEOMETRY="1024x768"
#GEOMETRY="1280x1024"

# The name that the VNC Desktop will have.
NAME="my-vnc-server"
USER=root

OPTIONS="-name ${NAME} -depth ${DEPTH} -geometry ${GEOMETRY} :${DISPLAY}"

. /lib/lsb/init-functions

case "$1" in
start)
log_action_begin_msg "Starting vncserver for user '${USER}' on localhost:${DISPLAY}"
su ${USER} -c "/usr/bin/vncserver ${OPTIONS}"
;;

stop)
log_action_begin_msg "Stoping vncserver for user '${USER}' on localhost:${DISPLAY}"
su ${USER} -c "/usr/bin/vncserver -kill :${DISPLAY}"
;;

restart)
$0 stop
$0 start
;;
esac

exit 0