Project

General

Profile

Install prepacked CDH4 » History » Version 20

Henning Blohm, 11.12.2012 23:01

1 1 Henning Blohm
h2. Install CDH4 from a preconfigured repository
2
3 12 Udo Offermann
This site provides a pre-configured one-check out user space installation of Cloudera's CDH4 Hadoop and HBase distributions. This page explains how to install it on your machine - which is really, really simple compared to normally suggested Hadoop installation procedures.
4 1 Henning Blohm
5 11 Henning Blohm
*Note #1:* This will only work on Linux or Mac OS. A machine with 8GB of RAM should be sufficient.
6 1 Henning Blohm
7
*Note #2:* The repository also contains an Eclipse project file and has Eclipse launchers for most functions required.
8
9 9 Henning Blohm
*Note #3:* This setup is for educational purposes only. It has no security requirements and there is no one taking any liability on anything regardings its use.
10
11 5 Henning Blohm
In short there are the followings steps:
12 1 Henning Blohm
13
# Clone the repository
14
# Adapt your local environment
15 5 Henning Blohm
# Format HDFS
16
# Start and stop
17 1 Henning Blohm
18
h3. Clone the repository
19
20
The pre-configured distribution is stored in the repository "z2-samples-cdh4-base":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-cdh4-base. We assume you install everything (including an Eclipse workspace - if you run the samples) in *install*.
21
22 3 Henning Blohm
<pre><code class="ruby">
23 1 Henning Blohm
cd install
24 13 Udo Offermann
git clone -b master http://git.z2-environment.net/z2-samples.cdh4-base
25 1 Henning Blohm
</code></pre>
26 2 Henning Blohm
27
h3. Adapt your environment
28
29 14 Henning Blohm
Before you can run anything two customizations are needed:
30 2 Henning Blohm
31
h4. Set important environment variables
32
33 8 Henning Blohm
There is a shell script "env.sh":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-cdh4-base/revisions/master/entry/env.sh that you should open and change as described. At the time of this writing it is required that you define your JAVA_HOME (please do, even if set elsewhere already) and the NOSQL_HOME, which is the absolute path to the folder that has the *env.sh* file. This script is called from elsewhere and having absolute paths in here is a safe way to make sure things will be found.
34 2 Henning Blohm
35 20 Henning Blohm
If you are a Subversion user, note the following: In order to run embedded z2 M/R jobs, the *env.sh* identifies a z2 Home location next to the CDH4 checkout (see above) either in the folder *core* or in the folder *z2-base.core*. This is due to the fact that the Subclipse plugin of Eclipse uses the project name ("core") as check out folder while the command line client uses the folder name ("z2-base.core"). So please make sure, you have a z2 Home in exactly one of these locations (as said, depending on the Subversion client you use) or customize *env.sh* accordingly to set a good Z2_HOME variable (See also #959).
36 18 Henning Blohm
37 2 Henning Blohm
h4. Enable password-less SSH
38
39
Currently this is still required to have the start / stop scripts running. This requirement may be dropped in the future. 
40
41 1 Henning Blohm
If you have not created a unique key for SSH or have no idea what that is, run
42 2 Henning Blohm
43 3 Henning Blohm
<pre><code class="ruby">
44 2 Henning Blohm
ssh-keygen
45
</code></pre>
46 1 Henning Blohm
47 15 Udo Offermann
(just keep hitting enter). Next copy that key over to the machine you want to log on to without password, i.e. localhost in this case (you can get ssh-copy-id from "here":https://gist.github.com/2575680 if you don't have it):
48 2 Henning Blohm
49 3 Henning Blohm
<pre><code class="ruby">
50 2 Henning Blohm
ssh-copy-id <your user name>@localhost
51
</code></pre>
52
53 1 Henning Blohm
If this fails because your SSH works differently, or ssh will refuse to log on without password please "ask the internet". Sorry.
54 2 Henning Blohm
55
All that matters is that in the end
56
57 3 Henning Blohm
<pre><code class="ruby">
58 1 Henning Blohm
ssh <your user name>@localhost
59
</code></pre>
60
61
(substituting <your user name> with your actual user name of course) works without asking for a password.
62 16 Udo Offermann
In addition it may help to run <code>ssh <your user name>@0.0.0.0</code> as well to make sure the host key for that (localhost) address has been verified.
63 3 Henning Blohm
64
h3. Formatting HDFS
65
66
Finally, the last step before you can start up, is to prepare the local node to store data. This is done by running the *format_dfs.sh* script. Alternatively you can use the Eclipse launcher of the same name.
67
68 8 Henning Blohm
This should complete without any questions or errors. Otherwise please verify your settings above.
69 3 Henning Blohm
70
h3. Start and Stop
71 1 Henning Blohm
72
Depending on your sample requirements, you can start Hadoop (HDFS, Yarn, the History Server) or HBase (including all the Hadoop services) using the *start_hadoop.sh* script (or launcher) or the *start_hbase.sh* script (or launcher) respectively. Similarly you can stop everything with the stop scripts.
73 5 Henning Blohm
74 6 Henning Blohm
When you have started, after a short while, using *jps* on the command line, you should see the following Java processes (and possibly others of course):
75 1 Henning Blohm
76 6 Henning Blohm
<pre><code class="ruby">
77
DataNode
78
NodeManager
79
NameNode
80
SecondaryNameNode
81
JobHistoryServer
82 1 Henning Blohm
ResourceManager
83
</code></pre>
84
85 17 Henning Blohm
and additionally those, if you run HBase:
86
87
<pre><code class="ruby">
88
HRegionServer
89
HQuorumPeer
90
HMaster
91
</code></pre>
92 3 Henning Blohm
93 4 Henning Blohm
There is lots of other scripts in the distribution that you can use to start or stop single components. If you do however, please run (in the shell):
94 1 Henning Blohm
95 4 Henning Blohm
<pre><code class="ruby">
96
. ./env.sh
97
</code></pre>
98
(note the leading period)
99
100 3 Henning Blohm
If you ran the start script and it returned, here is some URLs you should check to verify everything is looking good:
101
102 6 Henning Blohm
* Try to reach the Namenode at http://localhost:50070
103
* Try to reach the Yarn Nodemanager at http://localhost:8088
104 4 Henning Blohm
105 7 Henning Blohm
and, if you are running HBase:
106 4 Henning Blohm
107 6 Henning Blohm
* Try to reach the HBase Master at http://localhost:60010
108 4 Henning Blohm
109
*Note:* If you notice that you cannot restart or that HBase is not stopping correctly, that is most likely exactly the case. Sometimes HBase processes do not stop. To make sure there is no process left, use *jps* from the command line and kill remaining processes.