在MacOs上配置Hadoop和Spark环境

2023-06-25,,

在MacOs上配置hadoop和spark环境

Setting up Hadoop with Spark on MacOs

Instructions

    准备环境
    如果没有brew,先google怎样安装brew
    先uninstall老版本的Hadoop

    brew cleanup hadoop

    然后更新homebrew formulae

    brew update
    brew upgrade
    brew cleanup

    检查版本信息

    brew info hadoop
    brew info apache-spark
    brew info sbt
    brew info scala

    如果以上程序没有安装,需要使用brew install app 进行安装。

    安装环境
    安装hadoop

    brew install hadoop

    安装spark

    brew install apache-spark scala sbt

    设置环境变量
    使用vim编辑~/.bash_profile,将以下内容贴到最后

     # set environment variables
    export JAVA_HOME=$(/usr/libexec/java_home)
    export HADOOP_HOME=/usr/local/Cellar/hadoop/2.5.1
    export HADOOP_CONF_DIR=$HADOOP_HOME/libexec/etc/hadoop
    export SCALA_HOME=/usr/local/Cellar/apache-spark/1.1.0 # set path variables
    export PATH=$PATH:$HADOOP_HOME/bin:$SCALA_HOME/bin # set alias start & stop scripts
    alias hstart=$HADOOP_HOME/sbin/start-dfs.sh;$HADOOP_HOME/sbin/start-yarn.sh
    alias hstop=$HADOOP_HOME/sbin/stop-dfs.sh;$HADOOP_HOME/sbin/stop-yarn.sh

    Hadoop必须要使ssh生效,设置ssh

    配置文件路径:

    /etc/sshd_config

    生成秘钥:

    sh-3.2# sudo ssh-keygen -t rsa

      Generating public/private rsa key pair.
    Enter file in which to save the key (/var/root/.ssh/id_rsa): 输入/var/root/.ssh/id_rsa
    Enter passphrase (empty for no passphrase): [直接回车]
    Enter same passphrase again: [直接回车]
    Your identification has been saved in /var/root/.ssh/id_rsa.
    Your public key has been saved in /var/root/.ssh/id_rsa.pub.
    key fingerprint is:
    97:e9:5a:5e:91:52:30:63:9e:34:1a:6f:24:64:75:af root@cuican.local
    The key's randomart image is:
    +--[ RSA 2048]----+
    | .=.X . |
    | . X B . |
    | . = . . |
    | . + o |
    | S = E |
    | o . . |
    | o . |
    | + . |
    | . . |
    +-----------------+

    修改配置文

    sudo vim /etc/ssh/sshd_config

      Port 22
    #AddressFamily any
    #ListenAddress 0.0.0.0
    #ListenAddress ::
    # The default requires explicit activation of protocol 1
    Protocol 2
    # HostKey for protocol version 1
    #HostKey /etc/ssh/ssh_host_key
    # HostKeys for protocol version 2
    #HostKey /etc/ssh/ssh_host_rsa_key
    #HostKey /etc/ssh/ssh_host_dsa_key
    #HostKey /etc/ssh/ssh_host_ecdsa_key
    HostKey /var/root/.ssh/id_rsa # Lifetime and size of ephemeral version 1 server key
    KeyRegenerationInterval 1h
    ServerKeyBits 1024 # Logging
    # obsoletes QuietMode and FascistLogging
    SyslogFacility AUTHPRIV
    #LogLevel INFO # Authentication:
    LoginGraceTime 2m
    PermitRootLogin yes
    StrictModes yes
    #MaxAuthTries 6
    #MaxSessions 10 RSAAuthentication yes PubkeyAuthentication yes

    启动ssh服务

    which sshd //查找sshd的位置。

    Mac 上sshd的位置在 /usr/sbin/sshd

    在终端输入sudo /usr/sbin/sshd即可启动sshd服务。

    ssh-keygen -t rsa
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

    配置Hadoop
    到hadoop的安装路径

    cd usr/local/Cellar/hadoop/2.5.1/libexec/

    编辑etc/hadoop/hadoop-env.sh

     # this fixes the "scdynamicstore" warning
    export HADOOP_OPTS="$HADOOP_OPTS -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

    编辑etc/hadoop/core-site.xml

     <configuration>
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
    </property>
    </configuration>

    编辑etc/hadoop/hdfs-site.xml

     <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    </configuration>

    编辑etc/hadoop/mapred-site.xml

     <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    </configuration>

    编辑etc/hadoop/yarn-site.xml

     <configuration>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    </configuration>

    开始启用Hadoop
    移动到Hadoop的root directory

    cd /usr/local/Cellar/hadoop/2.5.1

    格式化Hadoop HDFS

    ./bin/hdfs namenode -format

    启动NameNode和DataNode daemon

    ./sbin/start-dfs.sh

    从网页中查看

    http://localhost:50070/

    启动ResourceManager和NodeManager daemon

    ./sbin/start-yarn.sh

    检查所有的守护线程是不是已经在运行

    jps

    从网页中查看ResourceManager

    http://localhost:8088/

    创建HDFS目录

    ./bin/hdfs dfs -mkdir -p /user/{username}

    启动一个MapReduce的例子

     \#calculate pi
    ./bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar pi 10 100

    启动spark

    到Spark的安装目录

    cd /usr/local/Cellar/apache-spark/1.1.0

    启动Spark的例子

    ./bin/run-example SparkPi

    在网页中查看Spark任务

    http://localhost:4040/

    也可以使用Spark-submit来提交任务

     # pattern to launch an application in yarn-cluster mode
    ./bin/spark-submit --class <path.to.class> --master yarn-cluster [options] <app.jar> [options] # run example application (calculate pi)
    ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster libexec/lib/spark-examples-*.jar

    结束

在MacOs上配置Hadoop和Spark环境的相关教程结束。

《在MacOs上配置Hadoop和Spark环境.doc》

下载本文的Word格式文档,以方便收藏与打印。