Project

General

Profile

Install prepacked CDH4 » History » Version 7

Henning Blohm, 20.09.2012 15:03

1 1 Henning Blohm
h2. Install CDH4 from a preconfigured repository
2
3
This site provides a pre-configured one-ckeck out user space installation of Cloudera's CDH4 Hadoop and HBase distributions. This page explains how to install it on your machine - which is really, really simple compared to normally suggested Hadoop installation procedures.
4
5
*Note #1:* This will only work on Linux or Mac OS
6
7
*Note #2:* The repository also contains an Eclipse project file and has Eclipse launchers for most functions required.
8
9 5 Henning Blohm
In short there are the followings steps:
10 1 Henning Blohm
11
# Clone the repository
12
# Adapt your local environment
13 5 Henning Blohm
# Format HDFS
14
# Start and stop
15 1 Henning Blohm
16
h3. Clone the repository
17
18
The pre-configured distribution is stored in the repository "z2-samples-cdh4-base":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-cdh4-base. We assume you install everything (including an Eclipse workspace - if you run the samples) in *install*.
19
20 3 Henning Blohm
<pre><code class="ruby">
21 1 Henning Blohm
cd install
22
git clone -b http://git.z2-environment.net/z2-samples.cdh4-base
23
</code></pre>
24 2 Henning Blohm
25
h3. Adapt your environment
26
27
Before you can run anything really there are two customizations needed:
28
29
h4. Set important environment variables
30
31
There is a shell script "env.sh":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-cdh4-base/revisions/master/entry/env.sh that you should open and change as described. At the time of this writing it is required that you define your JAVA_HOME (please do, even if set elsewhere already) and the NOSQL_HOME, which is the absolute path to the folder that has the *env.sh* file. This script is called from many places.
32
33
h4. Enable password-less SSH
34
35
Currently this is still required to have the start / stop scripts running. This requirement may be dropped in the future. 
36
37 1 Henning Blohm
If you have not created a unique key for SSH or have no idea what that is, run
38 2 Henning Blohm
39 3 Henning Blohm
<pre><code class="ruby">
40 2 Henning Blohm
ssh-keygen
41
</code></pre>
42 1 Henning Blohm
43 2 Henning Blohm
(just keep hitting enter). Next copy that key over to the machine you want to log on to without password, i.e. localhost in this case:
44
45 3 Henning Blohm
<pre><code class="ruby">
46 2 Henning Blohm
ssh-copy-id <your user name>@localhost
47
</code></pre>
48
49 1 Henning Blohm
If this fails because your SSH works differently, or ssh will refuse to log on without password please "ask the internet". Sorry.
50 2 Henning Blohm
51
All that matters is that in the end
52
53 3 Henning Blohm
<pre><code class="ruby">
54 1 Henning Blohm
ssh <your user name>@localhost
55
</code></pre>
56
57
(substituting <your user name> with your actual user name of course) works without asking for a password.
58 3 Henning Blohm
59
h3. Formatting HDFS
60
61
Finally, the last step before you can start up, is to prepare the local node to store data. This is done by running the *format_dfs.sh* script. Alternatively you can use the Eclipse launcher of the same name.
62
63
This should complete without any questions or errors. Otherwise please verify your setings above.
64
65
h3. Start and Stop
66 1 Henning Blohm
67
Depending on your sample requirements, you can start Hadoop (HDFS, Yarn, the History Server) or HBase (including all the Hadoop services) using the *start_hadoop.sh* script (or launcher) or the *start_hbase.sh* script (or launcher) respectively. Similarly you can stop everything with the stop scripts.
68 5 Henning Blohm
69 6 Henning Blohm
When you have started, after a short while, using *jps* on the command line, you should see the following Java processes (and possibly others of course):
70 1 Henning Blohm
71 6 Henning Blohm
<pre><code class="ruby">
72
HRegionServer
73
HQuorumPeer
74
DataNode
75
NodeManager
76
HMaster
77
NameNode
78
SecondaryNameNode
79
JobHistoryServer
80
ResourceManager
81
</code></pre>
82 1 Henning Blohm
83 3 Henning Blohm
84 4 Henning Blohm
There is lots of other scripts in the distribution that you can use to start or stop single components. If you do however, please run (in the shell):
85 1 Henning Blohm
86 4 Henning Blohm
<pre><code class="ruby">
87
. ./env.sh
88
</code></pre>
89
(note the leading period)
90
91 3 Henning Blohm
If you ran the start script and it returned, here is some URLs you should check to verify everything is looking good:
92
93 6 Henning Blohm
* Try to reach the Namenode at http://localhost:50070
94
* Try to reach the Yarn Nodemanager at http://localhost:8088
95 4 Henning Blohm
96 7 Henning Blohm
and, if you are running HBase:
97 4 Henning Blohm
98 6 Henning Blohm
* Try to reach the HBase Master at http://localhost:60010
99 4 Henning Blohm
100
*Note:* If you notice that you cannot restart or that HBase is not stopping correctly, that is most likely exactly the case. Sometimes HBase processes do not stop. To make sure there is no process left, use *jps* from the command line and kill remaining processes.