Hadoop Add-on » History » Version 11
Henning Blohm, 20.09.2012 23:27
1 | 1 | Henning Blohm | h1. The Hadoop add-on |
---|---|---|---|
2 | |||
3 | 2 | Henning Blohm | The Hadoop add on actually contains the client parts of the Cloudera Hadoop and HBase distribution plus some integration features that are described in [[How to Hadoop]] and related samples. |
4 | 1 | Henning Blohm | |
5 | 2 | Henning Blohm | It is provided via the repository "z2-addons.hadoop":http://redmine.z2-environment.net/projects/z2-addons/repository/z2-addons-hadoop. |
6 | |||
7 | 1 | Henning Blohm | As Hadoop and HBase do not have a clear client - server compatibility vector, you may only use the Hadoop add-on with a matching server version. Currently, in Z2 version 2.1, the contained distribution is CDH 4.0.1. |
8 | |||
9 | 6 | Henning Blohm | We do - for experimental use only! - provide an easy to install and use, pre-configured single-node CDH 4.0.1 via the Git repository "z2-samples.cdh4-base":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-cdh4-base. The samples [[Sample-hadoop-basic]] and [[Sample-hbase-full-stack-TBD]] make use of that. See [[Install prepacked CDH4]] on how to set it up. |
10 | 1 | Henning Blohm | |
11 | 2 | Henning Blohm | Extensions that help working with Hadoop are implemented by the modules *com.zfabrik.hadoop* and *com.zfabrik.hbase*. |
12 | 1 | Henning Blohm | |
13 | 2 | Henning Blohm | h2. Details on *com.zfabrik.hadoop* |
14 | |||
15 | 3 | Henning Blohm | Javadocs can be found here: "Javadocs":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/index.html |
16 | 2 | Henning Blohm | |
17 | h3. Component type com.zfabrik.hadoop.configuration |
||
18 | |||
19 | Components of this type provide Hadoop or HBase connectivity configuration via a component resource file "core-site.xml". There is no further configuration applicable. |
||
20 | |||
21 | h3. Component type com.zfabrik.hadoop.job |
||
22 | |||
23 | A Hadoop Map/Reduce job implementation. Using this component type, Jobs may be programmatically scheduled and run within the Z2 runtime. |
||
24 | |||
25 | Properties: |
||
26 | |||
27 | 4 | Henning Blohm | |_. Name |_. Value or Description| |
28 | |com.zfabrik.component.type|com.zfabrik.hadoop.job| |
||
29 | 3 | Henning Blohm | |component.className|The name of a class provided by the module that implements "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html | |
30 | |||
31 | |||
32 | h2. Details on *com.zfabrik.hbase* |
||
33 | |||
34 | This module provides additional utilities and types on top of com.zfabrik.hadoop that simplify and help working with HBase. |
||
35 | |||
36 | Javadocs can be found here: "Javadocs":http://www.z2-environment.net/javadoc/com.zfabrik.hbase!2Fjava/api/index.html |
||
37 | |||
38 | This module does not provide component types. It does however providing a narrowing interface "IHBaseMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hbase!2Fjava/api/com/zfabrik/hbase/IHBaseMapReduceJob.html that extends "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html for Map/Reduce jobs over HBase tables. |
||
39 | |||
40 | 1 | Henning Blohm | See also [[Sample-hbase-full-stack-TBD]]. |
41 | 5 | Henning Blohm | |
42 | h2. How does Map/Reduce with Z2 on Hadoop work |
||
43 | |||
44 | 9 | Henning Blohm | When preparing a job for execution by Hadoop, what actually happens is that Hadoop stores one or more jar libraries in HDFS. In order to run a map, reduce, or combine task, Hadoop downloads the libraries to the local node and runs the required task from code provided by the libraries. |
45 | |||
46 | 10 | Henning Blohm | When running a job with Z2's Hadoop integration this is no different. But instead of submitting the actual task implementations to Hadoop, a generic job library is provided to Hadoop. On the node executing a task, the generic task implementations (all "here":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/impl/package-summary.html) start an embedded Z2 runtime (see "ProcessRunner":http://www.z2-environment.net/javadoc/com.zfabrik.core.api!2Fjava/api/com/zfabrik/launch/ProcessRunner.html), in-process, look for the Job component that implements "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html and delegate execution in context to the real implementation. |
47 | |||
48 | The one catch here is that z2 home must be available on the node running the task and it must be found by the generic implementation. |
||
49 | |||
50 | In the samples this is achieved by having the environment variable Z2_HOME point to the installation next to the Hadoop installation. In cluster setups, a Z2 core is part of the installables next to Hadoop, HBase, and others. |
||
51 | |||
52 | Only the core is required, as job updates will be retrieved from repositories automatically. |
||
53 | |||
54 | 11 | Henning Blohm | A true specialty of the sample setups is the use of the Dev Repo (see "Workspace development using the Dev Repository":http://www.z2-environment.eu/v21doc#Workspace%20development%20using%20the%20Dev%20Repository). As the Dev Repo is controlled by system properties and as the Hadoop integration is aware of this use case, we can use the Hadoop client connection config (e.g. "here":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-hadoop-basic/revisions/master/entry/com.zfabrik.samples.hadoop-basic.wordcount/nosql/core-site.xml) to also convey a Dev Repo scan root (so to say) which allows to run M/R jobs directly from the workspace - you might say. |
55 | 7 | Henning Blohm | |
56 | h2. How to support other Hadoop versions |
||
57 | 8 | Henning Blohm | |
58 | TBD |