The days of using the MapReduce framework for big data processing may be numbered.
Spark, an in-memory framework designed to work with the Hadoop Distributed File System (HDFS), has now become an official Apache project. This is great news for Spark, as it ensures that the project will gain some stability as it continues to grow and popularize among users of Hadoop.
As this article points out, Spark has many advantages over MapReduce. It is much faster than MapReduce for most applications because it is in-memory. It is also relatively easier to program. Perhaps most interesting is that it is also primed to handle future big data applications - including machine learning and real time processing.
While Spark is a fascinating project with great prospects, MapReduce still has some advantages as the dominant Hadoop programming model. Spark still cannot do everything that MapReduce can, and MapReduce may be better at handling batch processing applications.
Tuesday, March 25, 2014
Tuesday, March 18, 2014
Amazon and IBM vs. Open Source Hadoop
![]() |
Forrester Research: Hadoop vendor chart |
A report from Forrester Research on Hadoop vendors shows the bigger companies like IBM and Amazon as having the long term strategic advantage. This is presumably because they have more resources to research, develop, produce, and sell Hadoop related software and services.
As the author of this editorial points out, these large companies have taken advantage of the Hadoop trend without contributing to the development of the framework. As a result, the ones who actually have a strategic edge is the developers because they shape the direction that the project will go in. The companies that contribute to the collaborative project will ultimately gain the strategic edge as vendors because they will be able to influence the future of Hadoop.
Subscribe to:
Posts (Atom)