Project

General

Profile

Sample-hadoop-basic » History » Revision 4

Revision 3 (Henning Blohm, 20.09.2012 15:12) → Revision 4/20 (Henning Blohm, 20.09.2012 15:26)

h1. A simple Hadoop with Z2 sample 

 This sample is an adaptation of the classical Wordcount sample in the Z2 context. This sample is supposed to show you how Hadoop can be used from within Z2 and in particular how to write Map/Reduce jobs in that context.  

 *Note #1:* This sample is made to be run on Linux or Mac-OS. Supposedly it is possible to run Hadoop on Windows. Sorry, but we have not been able to adapt the sample yet. 
 *Note #2:* For your convenience everything in this sample assumes you use Eclipse. That as such is of course no prerequisite to running run the software, but it just makes everything much more integrated for now. Please have Eclipse ready and the Eclipsoid installed. See [[How to install Eclipsoid]].  

 This sample is provided by the repository "z2-samples-hadoop-basic":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-hadoop-basic. 

  

 h2. Prerequisites 

 This sample makes use of the [[Hadoop add-on]] that is based on Cloudera's CDH4 distribution of Hadoop. As client access is version dependent, so is the sample. In order to simplify this for you, there is a pre-configured CDH4 distribution available to you from this site. Apart from its development style configuration (i.e. no security), this is anyway the way we prefer to install Hadoop and friends: Just one root installation folder, one OS user, one log folder etc. 

 Please follow the procedure described here: [[Install prepack CDH4]]. 

 To use with this sample, it is most convenient, if you clone and configure the CDH4 install next to your Eclipse workspace and the sample repository clone. 

 h2. Setting up the sample 

 From here on, the sample is run like all samples, that is, following [[How to run a sample]]. 

 Assuming everything (including the CDH4 setup) is under *install* and your workspace in in *install/workspace* please clone "z2-samples-hadoop-basic":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-hadoop-basic under *install* as well. Either from the command line as  

 <pre><code class="ruby"> 
 cd install 
 git clone -b master http://git.z2-environment.net/z2-samples.hadoop-basic 
 </code></pre> 

 or from within Eclipse using the Git repositories view (but make sure the folder is right next to your z2-base.core clone). 

 You should have an Eclipse workspace and next to it *z2-samples.hadoop-basic*, *z2-samples.cdh4-base*, and *z2-base.core*. Import all projects into your workspace. 

 We assume that you followed the steps in [[Install prepack CDH4]] and Hadoop is running (we do not need HBase in this case).