Wang Zheng Yuan: Spark is overtaking MapReduce. Are you ready?

If there’s one thing we can say with certainty about “IT,” it’s that both the information and the technology are constantly changing.

You know about the information; volume, velocity, and variety are exploding, forcing us to be ever more-vigilant about the fourth “v,” veracity. What’s has my interested to piqued is the other side, the technology of handling all that data. What’s more interesting is the way Apache Spark is pushingMapReduce aside for clustered processing of large data volumes.

There is no doubt that Spark’s swift growth is coming at the expense of the MapReduce component of theApache Hadoop software framework. Consider this — In its December 2015 survey of 3,100 IT professionals, (59% of whom are developers), Typesafe, a San Francisco maker of development tools, noted that 22% of respondents are actively working with Spark.

So what’s the allure of Spark? I asked Anand Iyer, senior product manager at Cloudera, the first company to commercialize Apache Hadoop. “Compared with MapReduce, Spark is almost an order of magnitude faster, it has a significantly more approachable and extensible API, and it is highly scalable,” he said. “Spark is a fantastic, flexible engine that will eventually replace MapReduce.” Cloudera isn’t wasting time: In September 2015, the companyrevved up its efforts to position Spark as the successor to Hadoop’s MapReduce framework.

Even IBM is on board. In June 2015, IBM called Spark “potentially the most significant open source project of the next decade.” IBM is already working to embed Spark in its analytics and commerce platforms, and it is assigning 3,500 researchers to Spark-related projects. Yes, 3,500 is correct.

Research firm Gartner, reacting to the IBM initiative, said Information and analytics leaders must begin working to ensure that they have needed knowledge and skills.

And that’s the key. As a developer, you must work with an ever-changing toolbox. New skills must be acquired and mastered, and sometimes old ones left behind. Though MapReduce is not going to disappear anytime soon, the shift from MapReduce to Spark appears to be happening with astonishing speed. Are you ready?

What are your organization’s plans with respect to MapReduce and Spark? Are you planning to switch? Or maybe you’ve never even gone so far as to implement MapReduce. Share your opinions on this important topic. We’d like to hear from you.

Wang Zheng Yuan

Monday, February 1, 2016

Spark is overtaking MapReduce. Are you ready?

No comments:

Post a Comment