StreamJob.java
run() method:
init(); 生成 Environment env_ 对象
prePorcessArgs();
parseArgv(); 解析Hadoop Streaming 命令参数,并赋值给StreamJob成员变量
postProcessArgs(); 检查输入参数的完整性,有效性,充分性
setJobConf(); 根据上面的命令参数,配置mapreduce job 的各项参数
JobConf: jobConf_ : general MapRed job properties
Configuration: config_ : as parameter to create JobConf object.
Class fmt=TextInputFormat.class
TextInputFormat implements InputFormat interface:
public interface InputFormat<K,V>
InputFormatdescribes the input-specification for a Map-Reduce job.The Map-Reduce framework relies on the
InputFormatof the job to:
- Validate the input-specification of the job.
- Split-up the input file(s) into logical
InputSplits, each of which is then assigned to an individualMapper.- Provide the
RecordReaderimplementation to be used to glean input records from the logicalInputSplitfor processing by theMapper.

本文介绍了StreamJob类中run()方法的执行流程,包括初始化、参数解析与配置等步骤,详细阐述了如何通过Hadoop Streaming设置MapReduce作业。


被折叠的 条评论
为什么被折叠?



