January 31, 2009
I'm done with the MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat. It's a classical paper and an enjoyable read, although it has no specifics. The insight about the application of a suitable restricted programming model is pure genious. So many people tried to create super complex systems from OpenMosix to MPI before. Well, MapReduce library is far from trivial itself, but at least its applications are isolated from all the mechanics. This is the hallmark of successful modern web services in general, too, such as Google BigTable, Amazon AWS (S3/EC2), etc.
The biggest import for the class, I think, is how MapReduce is high in the food chain. To work, it needs a cluster management solution, a job scheduling solution, and a distributed filesystem with coherency and atomic operations. Prof. Bridges likes to talk about "running MapReduce on a network of parking meters", but show me a network of parking meters that has the minimum infrastructure required by MapReduce. To be fair, he may mean for us to play with restricted programming models in general, and to think out of the box, but even so this seems a bit far fetched. Or maybe I'm just too old and crusty.
Pondering if I should read the Sinfonia thing. The class wiki says that we only have the public mobile applications remaining before project discussions and presentations, so maybe do that instead.
22 queries taking 0.0164 seconds, 18 records returned.
Powered by Minx 1.1.6c-pink.