Project

General

Profile

How to Hadoop » History » Version 4

Henning Blohm, 20.09.2012 12:29

1 1 Henning Blohm
h1. How to use Z2 with Hadoop
2
3 4 Henning Blohm
One of Z2's most intriguing capabilities is to seamlessly integrate with Hadoop in the sense that Map-Reduce jobs can be now considered ordinary application components.
4 1 Henning Blohm
5 4 Henning Blohm
This means that M/R Job implementations can re-use other application modules just like a Web app. Job implementations may be build on Spring, use Hibernate, use the very same data source definitions etc. No more special assembly and wiring just because you want to run Map-Reduce with Hadoop. 
6 1 Henning Blohm
7 4 Henning Blohm
Instead, when executing a distributed job by Hadoop, Z2 will be running embedded into the job's process and execute tasks within its normal component model. That is, your code now runs outside of its server context, but it does run within exactly the same logical environment.
8 1 Henning Blohm
9
!z2_hadoop.png!
10 4 Henning Blohm
11
All required extensions are provided by the [[Hadoop Add-on]].
12
13
Note that this add on is version-specific with request to Hadoop and companion components. While the Z2 extension modules should be rather version independent, the Hadoop and HBase client access is not. Please consult the add on page to find out more about what is in the add on and how you may be able to tweak it.
14
15
Hadoop is not a completely trivial system. Before you try Z2 with Hadoop you should have a basic understanding of what Hadoop does. 
16
17
On the other hand, the samples provided on this site may just be one of the simplest and fastest ways of getting a running Hadoop setup in the first place.
18
19
To learn more, there are two instructive samples:
20
21
* [[Sample-hadoop-basic]]: An adaptation of the classic word count example on Z2. This is the simplest to see something running and getting a feeling on how Z2 comes into the picture.
22
* [[Sample-hbase-full-stack-TBD]]: A more complicated example featuring a richer application environment. This is based on HBase on Hadoop. This is closer to real application scenarios than the basic sample.