Project

General

Profile

Hadoop Add-on » History » Version 16

Henning Blohm, 03.08.2014 12:36

1 1 Henning Blohm
h1. The Hadoop add-on
2
3 2 Henning Blohm
The Hadoop add on actually contains the client parts of the Cloudera Hadoop and HBase distribution plus some integration features that are described in [[How to Hadoop]] and related samples.
4 1 Henning Blohm
5 2 Henning Blohm
It is provided via the repository "z2-addons.hadoop":http://redmine.z2-environment.net/projects/z2-addons/repository/z2-addons-hadoop.
6
7 12 Henning Blohm
As Hadoop and HBase do not have a clear client - server compatibility vector, you may only use the Hadoop add-on with a matching server version. 
8
9
h2. Version map
10
11
|_. add-on version |_. Hadoop/HBase version |
12
| 2.1 | CDH 4.0.1 |
13 1 Henning Blohm
14 15 Henning Blohm
We do - for experimental use only! - provide an easy to install and use, pre-configured single-node CDH 4.0.1 via the Git repository "z2-samples.cdh4-base":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-cdh4-base. The samples [[Sample-hadoop-basic]] and [[Sample-hbase-mail-digester]] make use of that. See [[Install prepacked CDH4]] on how to set it up.
15 1 Henning Blohm
16 2 Henning Blohm
Extensions that help working with Hadoop are implemented by the modules *com.zfabrik.hadoop* and *com.zfabrik.hbase*.
17 1 Henning Blohm
18 2 Henning Blohm
h2. Details on *com.zfabrik.hadoop*
19
20 3 Henning Blohm
Javadocs can be found here: "Javadocs":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/index.html
21 2 Henning Blohm
22
h3. Component type com.zfabrik.hadoop.configuration
23
24
Components of this type provide Hadoop or HBase connectivity configuration via a component resource file "core-site.xml". There is no further configuration applicable.
25
26
h3. Component type com.zfabrik.hadoop.job
27
28
A Hadoop Map/Reduce job implementation. Using this component type, Jobs may be programmatically scheduled and run within the Z2 runtime. 
29
30
Properties:
31
32 4 Henning Blohm
|_. Name |_. Value or Description|
33
|com.zfabrik.component.type|com.zfabrik.hadoop.job|
34 3 Henning Blohm
|component.className|The name of a class provided by the module that implements "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html |
35
36
37
h2. Details on *com.zfabrik.hbase*
38
39
This module provides additional utilities and types on top of com.zfabrik.hadoop that simplify and help working with HBase.
40
41
Javadocs can be found here: "Javadocs":http://www.z2-environment.net/javadoc/com.zfabrik.hbase!2Fjava/api/index.html
42
43
This module does not provide component types. It does however providing a narrowing interface "IHBaseMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hbase!2Fjava/api/com/zfabrik/hbase/IHBaseMapReduceJob.html that extends "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html for Map/Reduce jobs over HBase tables. 
44
45 16 Henning Blohm
See also [[Sample-hbase-mail-digester]].
46 5 Henning Blohm
47
h2. How does Map/Reduce with Z2 on Hadoop work
48
49 9 Henning Blohm
When preparing a job for execution by Hadoop, what actually happens is that Hadoop stores one or more jar libraries in HDFS. In order to run a map, reduce, or combine task, Hadoop downloads the libraries to the local node and runs the required task from code provided by the libraries.
50
51 10 Henning Blohm
When running a job with Z2's Hadoop integration this is no different. But instead of submitting the actual task implementations to Hadoop, a generic job library is provided to Hadoop. On the node executing a task, the generic task implementations (all "here":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/impl/package-summary.html) start an embedded Z2 runtime (see "ProcessRunner":http://www.z2-environment.net/javadoc/com.zfabrik.core.api!2Fjava/api/com/zfabrik/launch/ProcessRunner.html), in-process, look for the Job component that implements "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html and delegate execution in context to the real implementation.
52
53 14 Henning Blohm
p{margin-top:3em; margin-bottom:3em}. !z2_hadoop.png!
54 13 Henning Blohm
55 10 Henning Blohm
The one catch here is that z2 home must be available on the node running the task and it must be found by the generic implementation. 
56
57
In the samples this is achieved by having the environment variable Z2_HOME point to the installation next to the Hadoop installation. In cluster setups, a Z2 core is part of the installables next to Hadoop, HBase, and others.
58
59
Only the core is required, as job updates will be retrieved from repositories automatically.
60
61 11 Henning Blohm
A true specialty of the sample setups is the use of the Dev Repo (see "Workspace development using the Dev Repository":http://www.z2-environment.eu/v21doc#Workspace%20development%20using%20the%20Dev%20Repository). As the Dev Repo is controlled by system properties and as the Hadoop integration is aware of this use case, we can use the Hadoop client connection config (e.g. "here":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-hadoop-basic/revisions/master/entry/com.zfabrik.samples.hadoop-basic.wordcount/nosql/core-site.xml) to also convey a Dev Repo scan root (so to say) which allows to run M/R jobs directly from the workspace - you might say.
62 7 Henning Blohm
63
h2. How to support other Hadoop versions
64 8 Henning Blohm
65
TBD