How to use Z2 with Hadoop

One of Z2's most intriguing capabilities is to seamlessly integrate with Hadoop in the sense that Map-Reduce jobs can be now considered ordinary application components.

This means that M/R Job implementations can re-use other application modules just like a Web app. Job implementations may be build on Spring, use Hibernate, use the very same data source definitions etc. No more special assembly and wiring just because you want to run Map-Reduce with Hadoop.

Instead, when executing a distributed job by Hadoop, Z2 will be running embedded into the job's process and execute tasks within its normal component model. That is, your code now runs outside of its server context, but it does run within exactly the same logical environment.

All required extensions are provided by the Hadoop Add-on.

Note that this add on is version-specific with request to Hadoop and companion components. While the Z2 extension modules should be rather version independent, the Hadoop and HBase client access is not. Please consult the add on page to find out more about what is in the add on and how you may be able to tweak it.

Hadoop is not a completely trivial system. Before you try Z2 with Hadoop you should have a basic understanding of what Hadoop does.

On the other hand, the samples provided on this site may just be one of the simplest and fastest ways of getting a running Hadoop setup in the first place.

To learn more, check out these:

  • Sample-hadoop-basic: An adaptation of the classic word count example on Z2. This is the simplest to see something running and getting a feeling on how Z2 comes into the picture.
  • Sample-hbase-mail-digester: A more complicated example featuring a richer application environment. This is based on HBase on Hadoop. This is closer to real application scenarios than the basic sample.
  • Hadoop add-on Details on the Hadoop add-on and how things actually work.
  • Install prepacked CDH4 A guide on how to install a simple to use CDH4 Hadoop/HBase distribution.