Version 13 - History - Install prepacked CDH4 - z2-Environment

Install prepacked CDH4 » History » Version 13

Udo Offermann, 21.09.2012 16:22

-Henning Blohm
+h2. Install CDH4 from a preconfigured repository
-Udo Offermann
+This site provides a pre-configured one-check out user space installation of Cloudera's CDH4 Hadoop and HBase distributions. This page explains how to install it on your machine - which is really, really simple compared to normally suggested Hadoop installation procedures.
 Henning Blohm
-Henning Blohm
+*Note #1:* This will only work on Linux or Mac OS. A machine with 8GB of RAM should be sufficient.
 Henning Blohm
 *Note #2:* The repository also contains an Eclipse project file and has Eclipse launchers for most functions required.
-Henning Blohm
+*Note #3:* This setup is for educational purposes only. It has no security requirements and there is no one taking any liability on anything regardings its use.
-Henning Blohm
+In short there are the followings steps:
 Henning Blohm
 # Clone the repository
 # Adapt your local environment
-Henning Blohm
+# Format HDFS
 # Start and stop
 Henning Blohm
 h3. Clone the repository
 The pre-configured distribution is stored in the repository "z2-samples-cdh4-base":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-cdh4-base. We assume you install everything (including an Eclipse workspace - if you run the samples) in *install*.
-Henning Blohm
+<pre><code class="ruby">
-Henning Blohm
+cd install
-Udo Offermann
+git clone -b master http://git.z2-environment.net/z2-samples.cdh4-base
-Henning Blohm
+</code></pre>
 Henning Blohm
 h3. Adapt your environment
 Before you can run anything really there are two customizations needed:
 h4. Set important environment variables
-Henning Blohm
+There is a shell script "env.sh":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-cdh4-base/revisions/master/entry/env.sh that you should open and change as described. At the time of this writing it is required that you define your JAVA_HOME (please do, even if set elsewhere already) and the NOSQL_HOME, which is the absolute path to the folder that has the *env.sh* file. This script is called from elsewhere and having absolute paths in here is a safe way to make sure things will be found.
 Henning Blohm
 h4. Enable password-less SSH
 Currently this is still required to have the start / stop scripts running. This requirement may be dropped in the future.
-Henning Blohm
+If you have not created a unique key for SSH or have no idea what that is, run
 Henning Blohm
-Henning Blohm
+<pre><code class="ruby">
-Henning Blohm
+ssh-keygen
 </code></pre>
 Henning Blohm
-Henning Blohm
+(just keep hitting enter). Next copy that key over to the machine you want to log on to without password, i.e. localhost in this case:
-Henning Blohm
+<pre><code class="ruby">
-Henning Blohm
+ssh-copy-id <your user name>@localhost
 </code></pre>
-Henning Blohm
+If this fails because your SSH works differently, or ssh will refuse to log on without password please "ask the internet". Sorry.
 Henning Blohm
 All that matters is that in the end
-Henning Blohm
+<pre><code class="ruby">
-Henning Blohm
+ssh <your user name>@localhost
 </code></pre>
 (substituting <your user name> with your actual user name of course) works without asking for a password.
-Henning Blohm
+In addition it may help to run <code>ssh hb@0.0.0.0</code> as well to make sure the host key for that (localhost) address has been verified.
 Henning Blohm
 h3. Formatting HDFS
 Finally, the last step before you can start up, is to prepare the local node to store data. This is done by running the *format_dfs.sh* script. Alternatively you can use the Eclipse launcher of the same name.
-Henning Blohm
+This should complete without any questions or errors. Otherwise please verify your settings above.
 Henning Blohm
 h3. Start and Stop
 Henning Blohm
 Depending on your sample requirements, you can start Hadoop (HDFS, Yarn, the History Server) or HBase (including all the Hadoop services) using the *start_hadoop.sh* script (or launcher) or the *start_hbase.sh* script (or launcher) respectively. Similarly you can stop everything with the stop scripts.
 Henning Blohm
-Henning Blohm
+When you have started, after a short while, using *jps* on the command line, you should see the following Java processes (and possibly others of course):
 Henning Blohm
-Henning Blohm
+<pre><code class="ruby">
 HRegionServer
 HQuorumPeer
 DataNode
 NodeManager
 HMaster
 NameNode
 SecondaryNameNode
 JobHistoryServer
 ResourceManager
 </code></pre>
 Henning Blohm
 Henning Blohm
-Henning Blohm
+There is lots of other scripts in the distribution that you can use to start or stop single components. If you do however, please run (in the shell):
 Henning Blohm
-Henning Blohm
+<pre><code class="ruby">
 . ./env.sh
 </code></pre>
 (note the leading period)
-Henning Blohm
+If you ran the start script and it returned, here is some URLs you should check to verify everything is looking good:
-Henning Blohm
+* Try to reach the Namenode at http://localhost:50070
 * Try to reach the Yarn Nodemanager at http://localhost:8088
 Henning Blohm
-Henning Blohm
+and, if you are running HBase:
 Henning Blohm
-Henning Blohm
+* Try to reach the HBase Master at http://localhost:60010
 Henning Blohm
 *Note:* If you notice that you cannot restart or that HBase is not stopping correctly, that is most likely exactly the case. Sometimes HBase processes do not stop. To make sure there is no process left, use *jps* from the command line and kill remaining processes.

Project

General

Profile

z2-Environment

Install prepacked CDH4 » History » Version 13