在Yarn集群上跑spark wordcount任务

准备的测试数据文件hello.txt

hello scala

hello world

nihao hello

i am scala

this is spark demo

gan jiu wan le

将文件上传到hdfs中

#创建hdfs测试目录

hdfs dfs -mkdir /user/spark/input/

#上传本地文件hello.txt到hdfs

 hdfs dfs -put ./hello.txt /user/spark/input/

代码（改为读取hdfs上的数据，并写入hdfs）

package org.example

import org.apache.spark.{SparkConf, SparkContext}

/**

 * spark-submit --master yarn --class org.example.SparkWordCountYarn /tmp/test/sparkwordcount2-1.0-SNAPSHOT.jar hdfs://hadoop1:8020/user/spark/input/hello.txt hdfs://hadoop1:8020/user/spark/output/helloOutput

 */

object SparkWordCountYarn {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf()

      .setAppName("WordCount")

      .setMaster("yarn")

    val srcFile = args(0)

    val outPutFile = args(1)

    val sc = new SparkContext(conf)

    val data = sc.textFile(srcFile)

    data.flatMap(_.split(" "))

      .map((_, 1))

      .reduceByKey(_+_)

      .saveAsTextFile(outPutFile)

  }

}

执行提交spark人物命令

spark-submit --master yarn --class org.example.SparkWordCountYarn /tmp/test/sparkwordcount2-1.0-SNAPSHOT.jar hdfs://hadoop1:8020/user/spark/input/hello.txt hdfs://hadoop1:8020/user/spark/output/helloOutput

在Yarn集群上跑spark wordcount任务的相关教程结束。

《在Yarn集群上跑spark wordcount任务.doc》

下载本文的Word格式文档，以方便收藏与打印。

在Yarn集群上跑spark wordcount任务

在Yarn集群上跑spark wordcount任务的相关教程结束。

相关推荐

MongoDB如何与Spark集成使用

spark如何连接mysql数据库

spark怎么读取hdfs数据

spark怎么读取kafka数据

Kafka怎么与Spark Streaming集成使用

Scala与Spark有什么关系

Actor并行化的wordcount怎么实现

Spark的相关问题有哪些