Hadoop Add-on » History » Version 14
Henning Blohm, 21.09.2012 11:55
1 | 1 | Henning Blohm | h1. The Hadoop add-on |
---|---|---|---|
2 | |||
3 | 2 | Henning Blohm | The Hadoop add on actually contains the client parts of the Cloudera Hadoop and HBase distribution plus some integration features that are described in [[How to Hadoop]] and related samples. |
4 | 1 | Henning Blohm | |
5 | 2 | Henning Blohm | It is provided via the repository "z2-addons.hadoop":http://redmine.z2-environment.net/projects/z2-addons/repository/z2-addons-hadoop. |
6 | |||
7 | 12 | Henning Blohm | As Hadoop and HBase do not have a clear client - server compatibility vector, you may only use the Hadoop add-on with a matching server version. |
8 | |||
9 | h2. Version map |
||
10 | |||
11 | |_. add-on version |_. Hadoop/HBase version | |
||
12 | | 2.1 | CDH 4.0.1 | |
||
13 | 1 | Henning Blohm | |
14 | 6 | Henning Blohm | We do - for experimental use only! - provide an easy to install and use, pre-configured single-node CDH 4.0.1 via the Git repository "z2-samples.cdh4-base":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-cdh4-base. The samples [[Sample-hadoop-basic]] and [[Sample-hbase-full-stack-TBD]] make use of that. See [[Install prepacked CDH4]] on how to set it up. |
15 | 1 | Henning Blohm | |
16 | 2 | Henning Blohm | Extensions that help working with Hadoop are implemented by the modules *com.zfabrik.hadoop* and *com.zfabrik.hbase*. |
17 | 1 | Henning Blohm | |
18 | 2 | Henning Blohm | h2. Details on *com.zfabrik.hadoop* |
19 | |||
20 | 3 | Henning Blohm | Javadocs can be found here: "Javadocs":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/index.html |
21 | 2 | Henning Blohm | |
22 | h3. Component type com.zfabrik.hadoop.configuration |
||
23 | |||
24 | Components of this type provide Hadoop or HBase connectivity configuration via a component resource file "core-site.xml". There is no further configuration applicable. |
||
25 | |||
26 | h3. Component type com.zfabrik.hadoop.job |
||
27 | |||
28 | A Hadoop Map/Reduce job implementation. Using this component type, Jobs may be programmatically scheduled and run within the Z2 runtime. |
||
29 | |||
30 | Properties: |
||
31 | |||
32 | 4 | Henning Blohm | |_. Name |_. Value or Description| |
33 | |com.zfabrik.component.type|com.zfabrik.hadoop.job| |
||
34 | 3 | Henning Blohm | |component.className|The name of a class provided by the module that implements "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html | |
35 | |||
36 | |||
37 | h2. Details on *com.zfabrik.hbase* |
||
38 | |||
39 | This module provides additional utilities and types on top of com.zfabrik.hadoop that simplify and help working with HBase. |
||
40 | |||
41 | Javadocs can be found here: "Javadocs":http://www.z2-environment.net/javadoc/com.zfabrik.hbase!2Fjava/api/index.html |
||
42 | |||
43 | This module does not provide component types. It does however providing a narrowing interface "IHBaseMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hbase!2Fjava/api/com/zfabrik/hbase/IHBaseMapReduceJob.html that extends "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html for Map/Reduce jobs over HBase tables. |
||
44 | |||
45 | 1 | Henning Blohm | See also [[Sample-hbase-full-stack-TBD]]. |
46 | 5 | Henning Blohm | |
47 | h2. How does Map/Reduce with Z2 on Hadoop work |
||
48 | |||
49 | 9 | Henning Blohm | When preparing a job for execution by Hadoop, what actually happens is that Hadoop stores one or more jar libraries in HDFS. In order to run a map, reduce, or combine task, Hadoop downloads the libraries to the local node and runs the required task from code provided by the libraries. |
50 | |||
51 | 10 | Henning Blohm | When running a job with Z2's Hadoop integration this is no different. But instead of submitting the actual task implementations to Hadoop, a generic job library is provided to Hadoop. On the node executing a task, the generic task implementations (all "here":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/impl/package-summary.html) start an embedded Z2 runtime (see "ProcessRunner":http://www.z2-environment.net/javadoc/com.zfabrik.core.api!2Fjava/api/com/zfabrik/launch/ProcessRunner.html), in-process, look for the Job component that implements "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html and delegate execution in context to the real implementation. |
52 | |||
53 | 14 | Henning Blohm | p{margin-top:3em; margin-bottom:3em}. !z2_hadoop.png! |
54 | 13 | Henning Blohm | |
55 | 10 | Henning Blohm | The one catch here is that z2 home must be available on the node running the task and it must be found by the generic implementation. |
56 | |||
57 | In the samples this is achieved by having the environment variable Z2_HOME point to the installation next to the Hadoop installation. In cluster setups, a Z2 core is part of the installables next to Hadoop, HBase, and others. |
||
58 | |||
59 | Only the core is required, as job updates will be retrieved from repositories automatically. |
||
60 | |||
61 | 11 | Henning Blohm | A true specialty of the sample setups is the use of the Dev Repo (see "Workspace development using the Dev Repository":http://www.z2-environment.eu/v21doc#Workspace%20development%20using%20the%20Dev%20Repository). As the Dev Repo is controlled by system properties and as the Hadoop integration is aware of this use case, we can use the Hadoop client connection config (e.g. "here":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-hadoop-basic/revisions/master/entry/com.zfabrik.samples.hadoop-basic.wordcount/nosql/core-site.xml) to also convey a Dev Repo scan root (so to say) which allows to run M/R jobs directly from the workspace - you might say. |
62 | 7 | Henning Blohm | |
63 | h2. How to support other Hadoop versions |
||
64 | 8 | Henning Blohm | |
65 | TBD |