login join help ad

January 31, 2009

MapReduce paper read

I'm done with the MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat. It's a classical paper and an enjoyable read, although it has no specifics. The insight about the application of a suitable restricted programming model is pure genious. So many people tried to create super complex systems from OpenMosix to MPI before. Well, MapReduce library is far from trivial itself, but at least its applications are isolated from all the mechanics. This is the hallmark of successful modern web services in general, too, such as Google BigTable, Amazon AWS (S3/EC2), etc.

The biggest import for the class, I think, is how MapReduce is high in the food chain. To work, it needs a cluster management solution, a job scheduling solution, and a distributed filesystem with coherency and atomic operations. Prof. Bridges likes to talk about "running MapReduce on a network of parking meters", but show me a network of parking meters that has the minimum infrastructure required by MapReduce. To be fair, he may mean for us to play with restricted programming models in general, and to think out of the box, but even so this seems a bit far fetched. Or maybe I'm just too old and crusty.

Pondering if I should read the Sinfonia thing. The class wiki says that we only have the public mobile applications remaining before project discussions and presentations, so maybe do that instead.

Tags: cs-591-005

Posted by: Pete Zaitcev at 05:14 PM | No Comments | Add Comment
Post contains 241 words, total size 2 kb.

Comments are disabled. Post is locked.
6kb generated in CPU 0.0171, elapsed 1.053 seconds.
23 queries taking 1.0494 seconds, 27 records returned.
Powered by Minx 1.1.6c-pink.