Wednesday, March 4, 2009

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

Dryad is another framework for writing distributed computations. Compared to map-reduce, it allows a general computational DAG, with arbitrary structure and where vertices can implement arbitrary computations. This gain is somewhat payed through increased complexity. However, as seen by recent industry feedback such as Yahoo, these features seem useful in practice. A classical example is a join between a small and a large table where the small table can be distributed to all nodes and held in memory.

Dryad is used on wide scale at Microsoft and I think will be influential in 10 years because as an extension to map-reduce like jobs, it is the first paper to show how this can be done on a data center. However, due to the lack of an open-source implementation, the more complex dryad paradigm lags map-reduce in worldwide usage.

No comments:

Post a Comment