Tuesday, March 25, 2014

Apache Spark and the Future of MapReduce

The days of using the MapReduce framework for big data processing may be numbered.

Spark, an in-memory framework designed to work with the Hadoop Distributed File System (HDFS), has now become an official Apache project. This is great news for Spark, as it ensures that the project will gain some stability as it continues to grow and popularize among users of Hadoop.

As this article points out, Spark has many advantages over MapReduce. It is much faster than MapReduce for most applications because it is in-memory. It is also relatively easier to program. Perhaps most interesting is that it is also primed to handle future big data applications - including machine learning and real time processing.

While Spark is a fascinating project with great prospects, MapReduce still has some advantages as the dominant Hadoop programming model. Spark still cannot do everything that MapReduce can, and MapReduce may be better at handling batch processing applications.

No comments:

Post a Comment