2024-03-11
BigData
00
请注意,本文编写于 36 天前,最后修改于 35 天前,其中某些信息可能已经过时。

目录

1 概述
1.1背景
1.2 版本说明
2 前期准备
2.1 安装包
2.2 host映射(所有节点)
2.3 关闭防火墙和Selinux(所有节点)
2.4 免密钥(所有节点)
2.5 分发脚本
3 部署环境
3.1 部署Jdk
3.1.1 卸载系统自带Jdk(所有节点)
3.1.2 安装jdk
3.2 部署hadoop
3.2.1 安装hadoop
3.2.2 修改配置文件
3.3.3 修改启动文件
3.3.4 分发
3.3.5 启动测试
3.3 部署mysql
3.4 部署Hive
3.5 部署spark

1 概述

1.1背景

基于 ARM架构下(银河麒麟v10内核版本4.19.90-17)版本的鲲鹏服务器,搭建原生的Apache hadoop集群,配置相关所需要的服务组件,如hive、spark。

1.2 版本说明


版本
Jdk1.8(aarch64版)
Hadoop3.3.1(aarch64版)
Hive3.1.2
Mysql8.0.28(aarch64版)
Spark2.4.7(纯净版-without hadoop)

2 前期准备

2.1 安装包

在部署之前,提前准备好以下安装包,主节点创建目录 /opt/software 存放

image2-1.png

下载地址
MySQLhttps://cdn.mysql.com/archives/mysql-8.0/mysql-8.0.28-1.el8.aarch64.rpm-bundle.tar
JDBChttps://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.28/[mysql-connector-java-8.0.28.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.28/mysql-connector-java-8.0.28.jar)
JDKhttps://www.oracle.com/java/technologies/downloads/
注:需要登陆
Hivehttps://archive.apache.org/dist/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
Sparkhttps://archive.apache.org/dist/spark/spark-2.4.7/[spark-2.4.7-bin-without-hadoop.tgz](https://archive.apache.org/dist/spark/spark-2.4.7/spark-2.4.7-bin-without-hadoop.tgz)

2.2 host映射(所有节点)

  1. 修改每台机器的hostname
shell
vim /etc/hostname
  1. 添加host映射
shell
vim /etc/hosts #填入自己集群节点的IP对应关系 10.168.60.21 hadoop102 10.168.60.22 hadoop103 10.168.60.23 hadoop104

2.3 关闭防火墙和Selinux(所有节点)

关闭防火墙

shell
systemctl status firewalld #查看防火墙 systemctl stop firewalld #关闭防火墙 systemctl disable firewalld #永久关闭防火墙

关闭selinux

shell
vi /etc/selinux/sconfig # SELINUX=enforcing 改为SELINUX=disabled

2.4 免密钥(所有节点)

1)生成公钥和私钥:

shell
[root@hadoop102 .ssh]$ ssh-keygen -t rsa

然后敲(三个回车),就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)
2)将公钥拷贝到要免密登录的目标机器上

shell
[root@hadoop102 .ssh]$ ssh-copy-id hadoop102 [root@hadoop102 .ssh]$ ssh-copy-id hadoop103 [root@hadoop102 .ssh]$ ssh-copy-id hadoop104

3)重复1和2的操作,配置hadoop103对hadoop102、hadoop103、hadoop104三台服务器免密登录。

2.5 分发脚本

rsync系统自带会有,检查下,没有就自己安装下
#检查

shell
rpm -q rsync

为后面方便快速分发配置文件准备

shell
mkdir ~/bin vim xsync

填写如下内容

shell
#!/bin/bash pcount=$# if((pcount==0)); then echo no args; exit; fi p1=$1 fname=`basename $p1` echo fname=$fname pdir=`cd -P $(dirname $p1); pwd` echo pdir=$pdir user=`whoami` for((host=103; host<105; host++)); do echo ------------------- hadoop$host -------------- rsync -av $pdir/$fname $user@hadoop$host:$pdir done

添加执行权限

shell
chmod +x ~/bin/xsync

3 部署环境

3.1 部署Jdk

3.1.1 卸载系统自带Jdk(所有节点)

查看系统自带jdk

shell
rpm -qa | grep jdk java -version

image2-2.png

卸载Jdk

shell
yum remove java-1.8.0-openjdk-1.8.0.242.b08-1.h5.ky10.aarch64 yum remove java-1.8.0-openjdk-headless-1.8.0.242.b08-1.h5.ky10.aarch64 yum remove java-11-openjdk-11.0.6.10-4.ky10.ky10.aarch64 yum remove java-11-openjdk-headless-11.0.6.10-4.ky10.ky10.aarch64

3.1.2 安装jdk

shell
mkdir /opt/module tar -zxvf /opt/software/jdk-8u341-linux-aarch64.tar.gz -C /opt/module/ mv /opt/module/jdk1.8.0_341 jdk # 配置环境变量 # 编辑/etc/profile文件,加入以下内容 export JAVA_HOME=/opt/module/jdk export PATH=$PATH:$JAVA_HOME/bin source /etc/profile

验证

image2-3.png

分发

shell
xsync /opt/module/jdk xsync /etc/profile

3.2 部署hadoop

3.2.1 安装hadoop

shell
# 解压 tar -zxvf /opt/software/hadoop-3.3.1-aarch64.tar.gz -C /opt/module/ mv /opt/module/hadoop-3.3.1 /opt/module/hadoop # 编辑/etc/profile文件,配置hadoop的环境变量 export HADOOP_HOME=/opt/module/hadoop export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

验证

image2-4.png

3.2.2 修改配置文件

1、修改hadoop-env.sh、mapred-env.sh和yarn-env.sh 添加如下内容

shell
export JAVA_HOME=/opt/module/jdk

image2-5.png

2、修改core-site.xml

xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop102:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value> /data/hadoop/tmp</value> </property> </configuration>

3、修改hdfs-site.xml

xml
<configuration> <property> <name>dfs.namenode.http-address</name> <value>hadoop102:50070</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/data/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/data/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.blocksize</name> <value>128m</value> </property> </configuration>

创建文件目录

shell
mkdir -p /data/hadoop/tmp mkdir -p /data/hadoop/dfs/name mkdir -p /data/hadoop/dfs/data xsync /data/hadoop/

4、修改mapred-site.xml

xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop102:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop102:19888</value> </property> </configuration>

5、修改yarn-site.xml

xml
<configuration> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hadoop103:8088</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>12288</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>12288</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>hadoop103:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop103:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoop103:8031</value> </property> <property> <name>yarn.application.classpath</name> <value> /opt/module/hadoop/etc/hadoop, /opt/module/hadoop/share/hadoop/common/lib/*, /opt/module/hadoop/share/hadoop/common/*, /opt/module/hadoop/share/hadoop/hdfs, /opt/module/hadoop/share/hadoop/hdfs/lib/*, /opt/module/hadoop/share/hadoop/hdfs/*, /opt/module/hadoop/share/hadoop/mapreduce/*, /opt/module/hadoop/share/hadoop/yarn, /opt/module/hadoop/share/hadoop/yarn/lib/*, /opt/module/hadoop/share/hadoop/yarn/* </value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>

修改work文件,编辑 /opt/module/hadoop/etc/hadoop/workers 添加以下内容

shell
hadoop102 hadoop103 hadoop104

3.3.3 修改启动文件

shell
# 编辑/opt/module/hadoop/sbin/start-dfs.sh文件,在第一行增加以下内容: HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=root HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root # 编辑/opt/module/hadoop/sbin/stop-dfs.sh文件,在第一行增加以下内容: HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=root HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root # 编辑/opt/module/hadoop/sbin/start-yarn.sh文件,在第一行增加以下内容: YARN_RESOURCEMANAGER_USER=root HDFS_DATANODE_SECURE_USER=root YARN_NODEMANAGER_USER=root # 编辑/opt/module/hadoop/sbin/stop-yarn.sh文件,在第一行增加以下内容: YARN_RESOURCEMANAGER_USER=root HDFS_DATANODE_SECURE_USER=root YARN_NODEMANAGER_USER=root

3.3.4 分发

shell
xsync /opt/module/hadoop/ xsync /etc/profile # 下面这个所有节点执行 source /etc/profile

3.3.5 启动测试

shell
# 初始化 name node hdfs namenode -format # 启动hadoop start-all.sh # hadoop103 执行 start-yarn.sh

3.3 部署mysql

shell
# 解压 mkdir -p /opt/software/mysql tar -xvf /opt/software/mysql-8.0.28-1.el8.aarch64.rpm-bundle.tar -C /opt/software/mysql # 安装,必须按照顺序执行 rpm -ivh mysql-community-client-plugins-8.0.28-1.el8.aarch64.rpm rpm -ivh mysql-community-common-8.0.28-1.el8.aarch64.rpm rpm -ivh mysql-community-libs-8.0.28-1.el8.aarch64.rpm rpm -ivh mysql-community-client-8.0.28-1.el8.aarch64.rpm rpm -ivh mysql-community-icu-data-files-8.0.28-1.el8.aarch64.rpm rpm -ivh mysql-community-server-8.0.28-1.el8.aarch64.rpm # 启动mysql systemctl start mysqld systemctl enable mysqld # 获取密码,复制root@localhost: 后面的内容 grep 'temporary password' /var/log/mysqld.log

用上面复制的密码

shell
mysql -uroot -p

配置mysql

sql
#在原先密码基础上改一个字母我之前的是 A)fsU_mu#6%h 否则啥也执行不了 ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'A)fsU_mu#6%q'; # 修改密码策略 set global validate_password.policy=0; set global validate_password.length=6; ALTER user 'root'@'localhost' IDENTIFIED BY 'root@guoyun'; FLUSH PRIVILEGES; use mysql; update user set host='%' where user='root'; FLUSH PRIVILEGES; ALTER USER 'root'@'%' IDENTIFIED BY 'root@guoyun' PASSWORD EXPIRE NEVER;

修改my.conf
修改配置文件

shell
vim /etc/my.cnf

添加下面的内容,解决group by 报错

sql
sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'

验证

shell
systemctl status mysqld

image2-6.png

3.4 部署Hive

shell
# 解压 tar -zxvf /opt/software/apache-hive-3.1.2-bin.tar.gz -C /opt/module/ mv /opt/module/apache-hive-3.1.2-bin/ /opt/module/hive # 拷贝jdbc包 cp /opt/software/mysql-connector-java-8.0.28.jar /opt/module/hive/lib/ # 拷贝hive-site.xml文件 cd /opt/module/hive/conf cp hive-default.xml.template hive-site.xml

修改配置文件hive-site.xml文件

xml
<property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>root@guoyun</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop102:3306/hive</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property>

将hive-site.xml文件里3215行的“& 8;”删除,如下图

image2-7.png

将hive-site.xml文件中的“system:”替换为空,如system:java.io.tmpdir替换为{system:java.io.tmpdir}替换为{java.io.tmpdir},文件中中有多处,全部搜索替换掉

image2-8.png

在mysql上创建一个hive库

image2-9.png

添加环境变量,编辑 /etc/profile添加如下内容,source使其生效

shell
export HIVE_HOME=/opt/module/apache-hive export PATH=$PATH:$HIVE_HOME/bin

初始化hive

shell
schematool -dbType mysql -initSchema

出现如下,初始化成功

image2-10.png

启动验证

image2-11.png

启动hiveserver2

shell
hive --service metastore & hive --service hiveserver2 &

验证

image2-12.png

3.5 部署spark

shell
# 解压 tar -zxvf /opt/software/spark-2.4.7-bin-without-hadoop.tgz -C /opt/module/ mv /opt/module/spark-2.4.7-bin-without-hadoop /opt/module/spark # 添加环境变量,编辑 /etc/profile添加如下内容,source使其生效 # SPARK_HOME export SPARK_HOME=/opt/module/spark export PATH=$PATH:$SPARK_HOME/bin # 在hive 中创建 spark 配置文件 vim /opt/module/hive/conf/spark-defaults.conf 添加如下内容 spark.master yarn spark.eventLog.enabled true spark.eventLog.dir hdfs://hadoop102:9000/spark-history spark.executor.memory 2g spark.driver.memory 1g # 在HDFS 创建如下路径 hadoop fs -mkdir /spark-history # 删除orc-core-1.5.5-nohive 包 rm -r /opt/moudle/spark/jars/orc-core-1.5.5-nohive.jar # 向 HDFS 上传 Spark 纯净版 jar 包 hadoop fs -mkdir /spark-jars hadoop fs -put /opt/moudle/spark/jars/* /spark-jars # 拷贝jar包到hive路径下 cp /opt/module/spark/jars/scala-compiler-2.11.12.jar /opt/module/hive/lib/ cp /opt/module/spark/jars/scala-library-2.11.12.jar /opt/module/hive/lib/ cp /opt/module/spark/jars/scala-reflect-2.11.12.jar /opt/module/hive/lib/ cp /opt/module/spark/jars/spark-core_2.11-2.4.7.jar /opt/module/hive/lib/ cp /opt/module/spark/jars/spark-network-common_2.11-2.4.7.jar /opt/module/hive/lib/ cp /opt/module/spark/jars/spark-unsafe_2.11-2.4.7.jar /opt/module/hive/lib/ cp /opt/module/spark/jars/spark-yarn_2.11-2.4.7.jar /opt/module/hive/lib/

编辑 /opt/module/hive/conf/hive-site.xml 文件,并在末尾添加如下内容

xml
<property> <name>spark.yarn.jars</name> <value>hdfs://hadoop102:9000/spark-jars/*</value> </property> <property> <name>hive.execution.engine</name> <value>spark</value> </property> <property> <name>hive.spark.client.connect.timeout</name> <value>10000ms</value> </property> <property> <name>spark.home</name> <value>/opt/module/spark</value> </property>

配置/opt/module/spark/conf/spark-env.sh

shell
cd /opt/module/spark/conf cp spark-env.sh.template spark-env.sh # 末尾添加如下内容 export TERM=xterm-color export JAVA_HOME=/opt/module/jdk export HADOOP_HOME=/opt/module/hadoop export HADOOP_HDFS_HOME=${HADOOP_HOME} export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export SPARK_HOME=/opt/module/spark export SPARK_DIST_CLASSPATH=$(hdfs classpath) export MASTER_WEBUI_PORT=8079 export SPARK_LOG_DIR=/opt/module/spark/logs export SPARK_LIBRARY_PATH=${SPARK_HOME}/jars export SPARK_DIST_CLASSPATH=$(hadoop classpath)

验证,启动hive执行sql

sql
-- 启动hive,执行sql create table demo (id int); insert into table demo values (1),(2); select * from demo;

image2-13.png

如果对你有用的话,可以打赏哦
打赏
ali pay
wechat pay

本文作者:XiaoWang0777

本文链接:

版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!

本网站由 又拍云Logo 提供CDN加速/云存储服务 萌ICP备20240377号