Mammoth is a new MapReduce system which aims to improve MapReduce performance using global memory management. We have conducted extensive experiments with comparison against the native Hadoop platform. The results show that the Mammoth system can reduce the total job execution time by 40% in typical cases, without requiring any modifications of Hadoop programs. When a system is short of memory, the performance improvement can be up to 5 times as observed for CPU and I/O intensive applications, such as PageRank. Given the growing importance of supporting large-scale data processing and analysis, and the proven success of the MapReduce platform, the Mammoth system can have a promising potential and impact. -
View it on GitHub