Monday, March 2, 2009

Pig Latin: A Not-So-Foreign Language for Data Processing

The paper presents a new language for querying information in data centers, trying to fill a gap between the high level SQL and the low level, hand written, map-reduce execution plans.
The main advantage that I see is that, unlike SQL, having this language does not impose a schema on the information, is extensible to user defined functions and allows nested structures.
The paper also makes the case that is it easier for programmers to write in Pig versus the declarative SQL (as it more natural to write imperative code and it is well known that debugging declarative programs is difficult).

In general I am skeptical when being presented with yet another new language and at a first glance it seems that most examples can be written in SQL. However, after reading more, I actually liked Pig and I think there is need for a new such language and the choices made by Pig make sense to me. Since it is open source and not many such systems are readily available to outside communities, I would say Pig may have the traction to be influential in 10 years.

No comments:

Post a Comment