在Yarn集群上跑spark wordcount任务

2022-10-25,,,

    准备的测试数据文件hello.txt
hello scala
hello world
nihao hello
i am scala
this is spark demo
gan jiu wan le
    将文件上传到hdfs中
#创建hdfs测试目录
hdfs dfs -mkdir /user/spark/input/
#上传本地文件hello.txt到hdfs
hdfs dfs -put ./hello.txt /user/spark/input/
    代码(改为读取hdfs上的数据,并写入hdfs)
package org.example

import org.apache.spark.{SparkConf, SparkContext}

/**
* spark-submit --master yarn --class org.example.SparkWordCountYarn /tmp/test/sparkwordcount2-1.0-SNAPSHOT.jar hdfs://hadoop1:8020/user/spark/input/hello.txt hdfs://hadoop1:8020/user/spark/output/helloOutput
*/
object SparkWordCountYarn { def main(args: Array[String]): Unit = {
val conf = new SparkConf()
.setAppName("WordCount")
.setMaster("yarn") val srcFile = args(0)
val outPutFile = args(1)
val sc = new SparkContext(conf)
val data = sc.textFile(srcFile)
data.flatMap(_.split(" "))
.map((_, 1))
.reduceByKey(_+_)
.saveAsTextFile(outPutFile)
}
}
    执行提交spark人物命令
spark-submit --master yarn --class org.example.SparkWordCountYarn /tmp/test/sparkwordcount2-1.0-SNAPSHOT.jar hdfs://hadoop1:8020/user/spark/input/hello.txt hdfs://hadoop1:8020/user/spark/output/helloOutput
    执行结果

在Yarn集群上跑spark wordcount任务的相关教程结束。

《在Yarn集群上跑spark wordcount任务.doc》

下载本文的Word格式文档,以方便收藏与打印。