Hadoop Add-on » History » Revision 9
Revision 8 (Henning Blohm, 20.09.2012 18:37) → Revision 9/16 (Henning Blohm, 20.09.2012 23:03)
h1. The Hadoop add-on The Hadoop add on actually contains the client parts of the Cloudera Hadoop and HBase distribution plus some integration features that are described in [[How to Hadoop]] and related samples. It is provided via the repository "z2-addons.hadoop":http://redmine.z2-environment.net/projects/z2-addons/repository/z2-addons-hadoop. As Hadoop and HBase do not have a clear client - server compatibility vector, you may only use the Hadoop add-on with a matching server version. Currently, in Z2 version 2.1, the contained distribution is CDH 4.0.1. We do - for experimental use only! - provide an easy to install and use, pre-configured single-node CDH 4.0.1 via the Git repository "z2-samples.cdh4-base":http://redmine.z2-environment.net/projects/z2-samples/repository/z2-samples-cdh4-base. The samples [[Sample-hadoop-basic]] and [[Sample-hbase-full-stack-TBD]] make use of that. See [[Install prepacked CDH4]] on how to set it up. Extensions that help working with Hadoop are implemented by the modules *com.zfabrik.hadoop* and *com.zfabrik.hbase*. h2. Details on *com.zfabrik.hadoop* Javadocs can be found here: "Javadocs":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/index.html h3. Component type com.zfabrik.hadoop.configuration Components of this type provide Hadoop or HBase connectivity configuration via a component resource file "core-site.xml". There is no further configuration applicable. h3. Component type com.zfabrik.hadoop.job A Hadoop Map/Reduce job implementation. Using this component type, Jobs may be programmatically scheduled and run within the Z2 runtime. Properties: |_. Name |_. Value or Description| |com.zfabrik.component.type|com.zfabrik.hadoop.job| |component.className|The name of a class provided by the module that implements "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html | h2. Details on *com.zfabrik.hbase* This module provides additional utilities and types on top of com.zfabrik.hadoop that simplify and help working with HBase. Javadocs can be found here: "Javadocs":http://www.z2-environment.net/javadoc/com.zfabrik.hbase!2Fjava/api/index.html This module does not provide component types. It does however providing a narrowing interface "IHBaseMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hbase!2Fjava/api/com/zfabrik/hbase/IHBaseMapReduceJob.html that extends "IMapReduceJob":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/job/IMapReduceJob.html for Map/Reduce jobs over HBase tables. See also [[Sample-hbase-full-stack-TBD]]. h2. How does Map/Reduce with Z2 on Hadoop work When preparing a job for execution by Hadoop, what actually happens is that Hadoop stores one or more jar libraries in HDFS. In order to run a map, reduce, or combine task, Hadoop downloads the libraries to the local node and runs the required task from code provided by the libraries. When running a job with Z2's Hadoop integration this is no different. But instead of submitting the actual task implementations to Hadoop, a generic job library is provided to Hadoop. On the node executing a task, the generic task implementations (all "here":http://www.z2-environment.net/javadoc/com.zfabrik.hadoop!2Fjava/api/com/zfabrik/hadoop/impl/package-summary.html) start an embedded Z2 runtime, in-process TBD h2. How to support other Hadoop versions TBD