Hadoop Add-on » History » Version 13
Henning Blohm, 21.09.2012 11:49
| 1 | 1 | Henning Blohm | h1. The Hadoop add-on |
|---|---|---|---|
| 2 | |||
| 3 | 2 | Henning Blohm | The Hadoop add on actually contains the client parts of the Cloudera Hadoop and HBase distribution plus some integration features that are described in [[How to Hadoop]] and related samples. |
| 4 | 1 | Henning Blohm | |
| 5 | 2 | Henning Blohm | It is provided via the repository "z2-addons.hadoop":http://redmine.z2-environment.net/projects/z2-addons/repository/z2-addons-hadoop. |
| 6 | |||
| 7 | 12 | Henning Blohm | As Hadoop and HBase do not have a clear client - server compatibility vector, you may only use the Hadoop add-on with a matching server version. |
| 8 | |||
| 9 | h2. Version map |
||
| 10 | |||
| 11 | |_. add-on version |_. Hadoop/HBase version | |
||
| 12 | | 2.1 | CDH 4.0.1 | |
||
| 13 | 1 | Henning Blohm | |
| 14 | 6 | Henning Blohm | We do - for experimental use only! - provide an easy to install and use, pre-configured single-node CDH 4.0.1 via the Git repository "z2-samples.cdh4-base":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-cdh4-base. The samples [[Sample-hadoop-basic]] and [[Sample-hbase-full-stack-TBD]] make use of that. See [[Install prepacked CDH4]] on how to set it up. |
| 15 | 1 | Henning Blohm | |
| 16 | 2 | Henning Blohm | Extensions that help working with Hadoop are implemented by the modules *com.zfabrik.hadoop* and *com.zfabrik.hbase*. |
| 17 | 1 | Henning Blohm | |
| 18 | 2 | Henning Blohm | h2. Details on *com.zfabrik.hadoop* |
| 19 | |||
| 20 | 3 | Henning Blohm | Javadocs can be found here: "Javadocs":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/index.html |
| 21 | 2 | Henning Blohm | |
| 22 | h3. Component type com.zfabrik.hadoop.configuration |
||
| 23 | |||
| 24 | Components of this type provide Hadoop or HBase connectivity configuration via a component resource file "core-site.xml". There is no further configuration applicable. |
||
| 25 | |||
| 26 | h3. Component type com.zfabrik.hadoop.job |
||
| 27 | |||
| 28 | A Hadoop Map/Reduce job implementation. Using this component type, Jobs may be programmatically scheduled and run within the Z2 runtime. |
||
| 29 | |||
| 30 | Properties: |
||
| 31 | |||
| 32 | 4 | Henning Blohm | |_. Name |_. Value or Description| |
| 33 | |com.zfabrik.component.type|com.zfabrik.hadoop.job| |
||
| 34 | 3 | Henning Blohm | |component.className|The name of a class provided by the module that implements "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html | |
| 35 | |||
| 36 | |||
| 37 | h2. Details on *com.zfabrik.hbase* |
||
| 38 | |||
| 39 | This module provides additional utilities and types on top of com.zfabrik.hadoop that simplify and help working with HBase. |
||
| 40 | |||
| 41 | Javadocs can be found here: "Javadocs":http://www.z2-environment.net/javadoc/com.zfabrik.hbase!2Fjava/api/index.html |
||
| 42 | |||
| 43 | This module does not provide component types. It does however providing a narrowing interface "IHBaseMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hbase!2Fjava/api/com/zfabrik/hbase/IHBaseMapReduceJob.html that extends "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html for Map/Reduce jobs over HBase tables. |
||
| 44 | |||
| 45 | 1 | Henning Blohm | See also [[Sample-hbase-full-stack-TBD]]. |
| 46 | 5 | Henning Blohm | |
| 47 | h2. How does Map/Reduce with Z2 on Hadoop work |
||
| 48 | |||
| 49 | 9 | Henning Blohm | When preparing a job for execution by Hadoop, what actually happens is that Hadoop stores one or more jar libraries in HDFS. In order to run a map, reduce, or combine task, Hadoop downloads the libraries to the local node and runs the required task from code provided by the libraries. |
| 50 | |||
| 51 | 10 | Henning Blohm | When running a job with Z2's Hadoop integration this is no different. But instead of submitting the actual task implementations to Hadoop, a generic job library is provided to Hadoop. On the node executing a task, the generic task implementations (all "here":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/impl/package-summary.html) start an embedded Z2 runtime (see "ProcessRunner":http://www.z2-environment.net/javadoc/com.zfabrik.core.api!2Fjava/api/com/zfabrik/launch/ProcessRunner.html), in-process, look for the Job component that implements "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html and delegate execution in context to the real implementation. |
| 52 | |||
| 53 | 13 | Henning Blohm | !z2_hadoop.png! |
| 54 | |||
| 55 | 10 | Henning Blohm | The one catch here is that z2 home must be available on the node running the task and it must be found by the generic implementation. |
| 56 | |||
| 57 | In the samples this is achieved by having the environment variable Z2_HOME point to the installation next to the Hadoop installation. In cluster setups, a Z2 core is part of the installables next to Hadoop, HBase, and others. |
||
| 58 | |||
| 59 | Only the core is required, as job updates will be retrieved from repositories automatically. |
||
| 60 | |||
| 61 | 11 | Henning Blohm | A true specialty of the sample setups is the use of the Dev Repo (see "Workspace development using the Dev Repository":http://www.z2-environment.eu/v21doc#Workspace%20development%20using%20the%20Dev%20Repository). As the Dev Repo is controlled by system properties and as the Hadoop integration is aware of this use case, we can use the Hadoop client connection config (e.g. "here":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-hadoop-basic/revisions/master/entry/com.zfabrik.samples.hadoop-basic.wordcount/nosql/core-site.xml) to also convey a Dev Repo scan root (so to say) which allows to run M/R jobs directly from the workspace - you might say. |
| 62 | 7 | Henning Blohm | |
| 63 | h2. How to support other Hadoop versions |
||
| 64 | 8 | Henning Blohm | |
| 65 | TBD |
