Sample-hbase-mail-digester » History » Version 7
Henning Blohm, 02.08.2014 14:42
1 | 1 | Henning Blohm | h1. Sample that combines HBase with full-stack Spring and Hibernate usage |
---|---|---|---|
2 | 2 | Henning Blohm | |
3 | 5 | Henning Blohm | This sample consists of an application that loads large Mbox archive files and extracts email addresses using a map reduce job. Extracted email addresses are then written to a relational database and offered for editing. |
4 | 3 | Henning Blohm | |
5 | 4 | Henning Blohm | Being a full stack sample it shows how to design a multi-module application with a service tier that can be seamlessly used from a Web app as well as from an application-level map-reduce job. |
6 | |||
7 | 1 | Henning Blohm | *Note*: This sample still uses v2.2 of z2 - so making sure the correct versions are specified below is crucial. |
8 | 5 | Henning Blohm | *Note*: Due to HBase, you will need to run this on Linux or Mac OS. |
9 | 4 | Henning Blohm | |
10 | h2. Install |
||
11 | 1 | Henning Blohm | |
12 | 5 | Henning Blohm | Here is the quick guide to getting things up and running. This follows closely [[How_to_run_a_sample]] and [[Install_prepacked_CDH4]]. |
13 | |||
14 | h3. Checkout |
||
15 | |||
16 | Create some installation folder and check out the z2 core and the HBase distribution, as well as the sample application. |
||
17 | |||
18 | 3 | Henning Blohm | <pre><code class="bash"> |
19 | git clone -b v2.2 http://git.z2-environment.net/z2-base.core |
||
20 | git clone -b v2.2 http://git.z2-environment.net/z2-samples.cdh4-base |
||
21 | git clone -b master http://git.z2-environment.net/z2-samples.hbase-mail-digester |
||
22 | 1 | Henning Blohm | </code></pre> |
23 | 5 | Henning Blohm | |
24 | (Note: Do not use your shared git folder, if you have any, as the neighborhood of these projects may be inspected by z2 later on). |
||
25 | |||
26 | h3. Prepare |
||
27 | |||
28 | We need to apply some minimal configuration for HBase. At first, please follow [[Install_prepacked_CDH4]] on how to configure your HBase checkout. There are a few steps that need to be taken once only but still have to. |
||
29 | |||
30 | Assuming HBase has started and all processes show as described, there is one last thing to get running before starting the actual application: |
||
31 | |||
32 | {{include(How to run Java db)}} |
||
33 | |||
34 | 6 | Henning Blohm | h2. Start |
35 | 5 | Henning Blohm | |
36 | 7 | Henning Blohm | Now that all databases are up we can start the application simply by running (as always): |
37 | 5 | Henning Blohm | |
38 | <pre><code class="bash"> |
||
39 | # on Linux / Mac OS: |
||
40 | cd z2-base.core/run/bin |
||
41 | ./gui.sh |
||
42 | |||
43 | # on Windows: |
||
44 | cd z2-base.core\run\bin |
||
45 | gui.bat |
||
46 | </code></pre> |
||
47 | |||
48 | 7 | Henning Blohm | At first startup this will download some significant amount of dependencies (Spring, Vaadin, etc.). So go and get yourself some coffee.... |
49 | 5 | Henning Blohm | |
50 | When started, go to http://localhost:8080/digester-admin. You should see this: |
||
51 | 1 | Henning Blohm | |
52 | !start.png! |
||
53 | 7 | Henning Blohm | |
54 | h2. Using the application |
||
55 | |||
56 | To feed data into the application, please download some mail archive in mbox format, e.g. from http://tomcat.apache.org/mail/. Upload the file to the digester application as outline on the first tab. |
||
57 | |||
58 | When imported, run the analysis job by clicking on |
||
59 | |||
60 | |||
61 | |||
62 | |||
63 | 4 | Henning Blohm | !job.png! |
64 | !counts.png! |