Hadoop 中 map任务数，reduce任务数和机器节点数之间是什么关系

如题所述

推荐答案 2016-02-02

搜到了答案，我觉得回答地不错。

根据Google发布的论文

MapReduce: Simplified Data Processing on Large Clusters
http://static.googleusercontent.com/media/research.google.com/zh-CN//archive/mapreduce-osdi04.pdf

这里引用3.5 TaskGranularity的一小段，下文中M代表map任务数，R代表reduce任务数

Furthermore, R is often constrained by users because the output of each reduce task ends up in a separate output fi le. In practice, we tend to choose M so that each individual task is roughly 16 MB to 64 MB of input data(so that the locality optimization described above is most effective), and we make R a small multiple of the number of worker machines we expect to use. We often perform MapReduce computations with M = 200,000 and R = 5,000, using 2,000 worker machines.

总的来说
map任务数倾向于把输入文件可以分割成16MB到64MB之间，因为这刚好是GFS每个分块文件的大小，可以减少数据在网络中流动
reduce任务数通常是机器节点数的小倍数
至于机器节点数？有钱就要任性，多多益善。。

著作权归作者所有。
商业转载请联系作者获得授权，非商业转载请注明出处。
作者：曾凌恒
链接：http://www.zhihu.com/question/26816539/answer/34133965

温馨提示：答案为网友推荐，仅供参考

当前网址：http://11.wendadaohang.com/zd/F2S8PvS222P84SvM7v4.html

相似回答

如何确定 Hadoop map和reduce的个数答：reduce在运行时往往需要从相关map端复制数据到reduce节点来处理，因此相比于map任务。reduce节点资源是相对比较缺少的，同时相对运行较慢，正确的reduce任务的个数应该是0.95或者1.75 *（节点数 ×mapred.tasktracker.tasks.maximum参数值）。如果任务数是节点个数的0.95倍，那么所有的reduce任务能够在 ma...

大家正在搜

sparkmap和reduce 命运2机器的武器任务沉睡节点 map函数和reduce的功能 map和reduce阶段 map和reduce的区别 map reduce 原理 map和reduce分别代表 map和reduce代码实现方法 map reduce过程

Hadoop 中 map任务数，reduce任务数 和机器节点数之间是什么关系

Hadoop 中 map任务数，reduce任务数和机器节点数之间是什么关系