运行spark jdbcrdd 连mysql 异常,什么原因

如题所述

第1个回答  2015-09-03
RDD是个抽象类,定义了诸如map()、reduce()等方法,但实际上继承RDD的派生类一般只要实现两个方法:

def getPartitions: Array[Partition]
def compute(thePart: Partition, context: TaskContext): NextIterator[T]

getPartitions()用来告知怎么将input分片;

compute()用来输出每个Partition的所有行(行是我给出的一种不准确的说法,应该是被函数处理的一个单元);
以一个hdfs文件HadoopRDD为例:

[java] view plaincopyprint?
override def getPartitions: Array[Partition] = {
val jobConf = getJobConf()
// add the credentials here as this can be called before SparkContext initialized
SparkHadoopUtil.get.addCredentials(jobConf)
val inputFormat = getInputFormat(jobConf)
if (inputFormat.isInstanceOf[Configurable]) {
inputFormat.asInstanceOf[Configurable].setConf(jobConf)
}
val inputSplits = inputFormat.getSplits(jobConf, minPartitions)
val array = new Array[Partition](inputSplits.size)
for (i <- 0 until inputSplits.size) {
array(i) = new HadoopPartition(id, i, inputSplits(i))
}
array本回答被提问者和网友采纳