伪集群模式安装

Catalogue
  1. CDH
  2. 安装hadoop-2.6.0-cdh5.4.0
    1. 配置伪集群
    2. 格式化HDFS
    3. 验证

搭建hadoop-2.6.0-cdh5.4.7伪分布式

CDH版下载地址

先来看看什么是CDH,为什么选择CHD版的Hadoop。

CDH

属于Hadoop的一个发行版。

Hadoop有以下发行版:

  • Apache Hadoop
  • Cloudera’s Distribution Including Apache Hadoop(CDH)
  • Hortonworks Data Platform (HDP)
  • MapR
  • EMR

CDH版有以下优点:

  • 版本划分清晰
  • 版本更新速度快
  • 支持Kerberos安全认证
  • 文档清晰
  • 支持多种安装方式(Cloudera Manager方式)

安装hadoop-2.6.0-cdh5.4.0

首先到指定网站下载安装包
CDH版下载地址

解压下载的安装包

配置伪集群

  • 1、进入 hadoop-2.6.0-cdh5.4.0/etc/hadoop
  • 2、编辑 hadoop-env.sh
    1
    vi hadoop-env.sh
  • 3、修改JAVA_HOME的配置为
    1
    export JAVA_HOME=/opt/tools/jdk1.8.0_131
  • 4、编辑core-site.xml,添加如下配置:

    1
    2
    3
    4
    5
    6
    <configuration>
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://node2:9000</value>
    </property>
    </configuration>

    node2说明,如果没有配置hosts,请将node2换成IP地址:wp保存并退出。

  • 5、编辑hdfs-site.xml,添加如下配置

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    <configuration>
    <property>
    <!--开启web hdfs-->
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>dfs.namenode.name.dir</name>
    <value>/opt/cdh/hadoop/name</value>
    <description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
    </property>
    <property>
    <name>dfs.namenode.edits.dir</name>
    <value>${dfs.namenode.name.dir}</value>
    <description>namenode存放 transactionfile(edits)本地目录(请自行修改)</description>
    </property>
    <property>
    <name>dfs.datanode.data.dir</name>
    <value>/opt/cdh/hadoop/data</value>
    <description>datanode存放block本地目录(请自行修改)</description>
    </property>
    </configuration>

    以上配置完成,还需要创建文件夹

1
2
mkdir -p cdh/hadoop/name
mkdir cdh/hadoop/data
  • 6、配置mapred-site.xml

    1
    2
    3
    4
    5
    6
    7
    8
    cp mapred-site.xml.template mapred-site.xml
    之后加入以下配置
    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    </configuration>
  • 7、编辑yarn-site.xml

    1
    2
    3
    4
    5
    6
    7
    <configuration>
    <!-- Site specific YARN configuration properties -->
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    </configuration>

到此,所有配置都已完成。

格式化HDFS

1
bin/hdfs namenode -format

看到如下信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
************************************************************/
15/09/22 14:59:46 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/09/22 14:59:46 INFO namenode.NameNode: createNameNode [-format]
15/09/22 14:59:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/22 14:59:49 WARN common.Util: Path /opt/cdh/hadoop/name should be specified as a URI in configuration files. Please update hdfs configuration.
15/09/22 14:59:49 WARN common.Util: Path /opt/cdh/hadoop/name should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-41ea6672-a32e-4b16-b704-962381ed409a
15/09/22 14:59:49 INFO namenode.FSNamesystem: No KeyProvider found.
15/09/22 14:59:49 INFO namenode.FSNamesystem: fsLock is fair:true
15/09/22 14:59:49 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/09/22 14:59:49 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
15/09/22 14:59:49 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/09/22 14:59:49 INFO blockmanagement.BlockManager: The block deletion will start around 2015 九月 22 14:59:49
15/09/22 14:59:49 INFO util.GSet: Computing capacity for map BlocksMap
15/09/22 14:59:49 INFO util.GSet: VM type = 64-bit
15/09/22 14:59:49 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
15/09/22 14:59:49 INFO util.GSet: capacity = 2^21 = 2097152 entries
15/09/22 14:59:50 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
15/09/22 14:59:50 INFO blockmanagement.BlockManager: defaultReplication = 1
15/09/22 14:59:50 INFO blockmanagement.BlockManager: maxReplication = 512
15/09/22 14:59:50 INFO blockmanagement.BlockManager: minReplication = 1
15/09/22 14:59:50 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
15/09/22 14:59:50 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
15/09/22 14:59:50 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/09/22 14:59:50 INFO blockmanagement.BlockManager: encryptDataTransfer = false
15/09/22 14:59:50 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
15/09/22 14:59:50 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)
15/09/22 14:59:50 INFO namenode.FSNamesystem: supergroup = supergroup
15/09/22 14:59:50 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/09/22 14:59:50 INFO namenode.FSNamesystem: HA Enabled: false
15/09/22 14:59:50 INFO namenode.FSNamesystem: Append Enabled: true
15/09/22 14:59:50 INFO util.GSet: Computing capacity for map INodeMap
15/09/22 14:59:50 INFO util.GSet: VM type = 64-bit
15/09/22 14:59:50 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
15/09/22 14:59:50 INFO util.GSet: capacity = 2^20 = 1048576 entries
15/09/22 14:59:50 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/09/22 14:59:50 INFO util.GSet: Computing capacity for map cachedBlocks
15/09/22 14:59:50 INFO util.GSet: VM type = 64-bit
15/09/22 14:59:50 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
15/09/22 14:59:50 INFO util.GSet: capacity = 2^18 = 262144 entries
15/09/22 14:59:50 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
15/09/22 14:59:50 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/09/22 14:59:50 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
15/09/22 14:59:50 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
15/09/22 14:59:50 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
15/09/22 14:59:50 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
15/09/22 14:59:50 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/09/22 14:59:50 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
15/09/22 14:59:50 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/09/22 14:59:50 INFO util.GSet: VM type = 64-bit
15/09/22 14:59:50 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
15/09/22 14:59:50 INFO util.GSet: capacity = 2^15 = 32768 entries
15/09/22 14:59:50 INFO namenode.NNConf: ACLs enabled? false
15/09/22 14:59:50 INFO namenode.NNConf: XAttrs enabled? true
15/09/22 14:59:50 INFO namenode.NNConf: Maximum size of an xattr: 16384
15/09/22 14:59:51 INFO namenode.FSImage: Allocated new BlockPoolId: BP-314159059-192.168.1.3-1442905191056
15/09/22 14:59:51 INFO common.Storage: Storage directory /opt/cdh/hadoop/name has been successfully formatted.
15/09/22 14:59:51 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/09/22 14:59:51 INFO util.ExitUtil: Exiting with status 0
15/09/22 14:59:51 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node2/192.168.1.3
************************************************************/

如果不报错,则格式化成功。

然后分别启动HDFS和Yarn:

1
2
sbin/start-dfs.sh
sbin/start-yarn.sh

启动过程没有错误则启动成功。

验证

  • 使用jps可以查看相关进程

显示如下:

1
2
3
4
5
6
7
nova@ubuntu208:~$ jps
7667 Jps
28532 DataNode
28742 SecondaryNameNode
29319 NodeManager
28376 NameNode
29018 ResourceManager
  • 管理地址

yarn: http://192.168.1.34:8088/cluster

hdfs状态: http://192.168.1.34:50070/dfshealth.html#tab-overview