Hierarchical MapReduce

Unleash the full power of MapReduce

About the programming model

The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive applications fit this programming model and benefit from the scalability that can be delivered using this model.

Although commercial clouds can provide virtually unlimited computation and storage resources on-demand, due to financial, security and possibly other concerns, many researchers still run experiments on a number of small clusters with limited number of nodes that cannot unleash the full power of MapReduce.

Our solution

We present a hierarchical MapReduce framework that gathers computation resources from different clusters and run MapReduce jobs across them.

The global controller in our framework splits the data set and dispatches them to multiple "local" MapReduce clusters, and balances the workload by assigning tasks in accordance to the capabilities of each cluster and of each node. The local results are then returned back to the global controller for global reduction.

Hierarchical MapReduce Architecture

Publications

Yuan Luo, Beth Plale, Zhenhua Guo, Wilfred W. Li, Judy Qiu, Yiming Sun. (2012), Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Concurrency and Computation: Practice and Experience, doi: 10.1002/cpe.2929

Yuan Luo and Beth Plale. Hierarchical MapReduce Programming Model and Scheduling Algorithms, in Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, May 13-16, 2012

Yuan Luo, Zhenhua Guo, Yiming Sun, Beth Plale, Judy Qiu, Wilfred W. Li, A Hierarchical Framework for Cross-Domain MapReduce Execution, in Proceedings of Emerging Computational Methods for the Life Sciences Workshop (ECMLS2011) of The 20th ACM High Performance Distributed Computing Conference (HPDC 2011), San Jose, California, June 8-10, 2011

Presentation & posters

A Hierarchical MapReduce Framework, Invited talk at IBM Student Workshop for Frontiers of Cloud Computing 2012, IBM Thomas J. Watson Research Center, Hawthorne, New York, July 30-31, 2012

Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Invited talk at Cloud Computing Lecture, Indiana University, Oct 12, 2011.

A Hierarchical Framework for Cross-Domain MapReduce Execution, Presented at ECMLS 2011 Workshop, co-located with HPDC 2011, San Jose, CA, Jun 8th, 2011

A Hierarchical MapReduce Framework, PRAGMA 22 Workshop, Monash University, Melbourne, Australia, April 17-19, 2012