Wednesday, March 4, 2009

MapReduce: Simplified Data Processing on Large Clusters

Map-reduce presents a framework for running large distributed computations. The main contribution of map-reduce is the identification of a construction that is simple, but at the same time is so general that naturally captures a (really) wide variety of distributed computations used in practice. This is the map-reduce framework.

It seems obvious that this paper will be influential in 10 years.
A tradeoff that can be identified from reading subsequent papers such as Dryad is the one between the simplicity of the model and its expressivity (the latter translates into more efficiency and ease of expressing some complex computations).

1 comment: