Install The Pentaho 5.0 Suite

Pentaho 5

This post is about installing the tools that were published on november 19th, 2013 on Sourceforge, that is version 5.0 and above. Installing a Database that can hold your data permanently and the Pentaho desktop tools and BI web server.

Future posts will be about creating a replica of the demo data on a permanent database that can hold your data, customize the apperance of web server, add notes on the new repository and articles about each Pentaho tool. They will be added through 2014 and the index of the series will be updated as work is done.

Open Source Software

You will download and install:

  • Java Development Kit
  • MySql Database and MySql Dashboard
  • Pentaho Tools:
    1. The ETL or Data Integration. A stand alone app that can be used to access data on diferent formats and systems, process and distribute it to the apropiate people. You can schedule its excecution with the web server or use it as a datasource it with the report designer.
    2. Report Designer, a desktop banded report builder that let you ask for parameters and present your reports on web, excel or pdf.
    3. Metadata Editor A desktop modeler tool that lets you build a meta-model of your data to make easier to your users to navigate it and you to control their access to it.
    4. BI Server, a Tomcat Web Server preconfigured with users, demo data and the pentaho integrated projects.

Java SDK Installation

Use the 32 or 64 bit java version that corresponds to your operating system. To find out which OS version is on your computer:

  • On linux: Open a terminal (launcher->Type Terminal, click on it), type on the command prompt:
    uname -m
    If the answer is i686 you have 32 bits.
  • On windows: Click on Start -> Right Click on Equipment. Select properties. Look up the OS information in the window.

ON Linux

If you know how about user administration, create and use a pentaho user and a pentaho group to install the software, if not, your current user will do.

  1. On Ubuntu:
    Open the ‘Ubuntu Software Center’ (type it on unity search). On the search box type:
    Executable java OpenJDK 7
    click on its install button.
  2. On any linux box:
    In a Terminal, type three commands:

    sudo add-apt-repository ppa:webupd8team/java
    sudo apt-get update
    sudo apt-get install oracle-java7-installer

In version 5.0 the scripts (bat & sh) looks for a PENTAHO_JAVA_HOME variable, so add it yo your profile. Please be carefull this is a configuration file.

  • You need to know the location of the java files.
    If you used the ubuntu center, it must be on
    /usr/lib/jvm/java-7-openjdk-i386
    Confirm this before continuing
  • In a command prompt edit your user .profile or the general profile:
    sudo gedit /etc/environment
    Add at the bottom the variable and the java (the one above the ‘bin’) folder:
    export PENTAHO_JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
    save and exit

You can check your java installation by typing on a terminal:

java -version

Windows

Open a web browser and go to:

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

Download either the 32 bit version: jdk-7u51-windows-i586.exe or the 64 bit file: jdk-7u51-windows-x64.exe and excecute it using administrator privileges (right click on it).

To configure the windows environment variable:

  1. Click Start Button, right-click equipment, select Properties.
  2. On your left, there should be an option: advanced system settings, click on it.
  3. Click on the environment variables button
  4. A dialog will open with two list, in the bottom one, type:
    PENTAHO_JAVA_HOME
    and then the java files path that should be something like:
    C:\Program Files\Java\jdk1.7.0_51

You can check java by typing on a console: java -version

MySQL Database

LINUX

To install MySql :

  1. On Ubuntu:
    Open the ‘Ubuntu Software Center‘ and then type ‘MySql’ to search for it, click on ‘MySQL Server‘ and then the Install button.
    Do the same with ‘MySQL Workbench‘ so you can interact with the database with a graphical tool.
  2. Or you can select other recomended ways for installing your software
    http://dev.mysql.com/doc/refman/5.6/en/linux-installation.html

WINDOWS

This is the main oracle/MySql page. You can select different methods to install it on windows, I prefer to use a MSI installer as it installs and configures the database server to the appropriate access permissions. On your system it can be as simple as running the msi with administration access or exhaustive and detailed port, process and file ownership configuration.

General instructions to download and Install are in this page. The download page is http://dev.mysql.com/downloads/ so install the MySQL Community Server and MySQL Workbench. Remember to select your Windows 32-bit OR 64-bit and MSI not zip download button. You will have standard users, configuration and lots of literature to chatch up, but you are ready to go now.

You can start the server doube clicking on <Your path>\bin\mysql.exe. With the Workbench (or the command window) you can run SQL commands or table editing.

ETL aka Pentaho data Integration (PDI)

Now the easy part: to install Pentaho.

Access the Pentaho Project on source forge. Click on the Data Integration folder. The latest version is 5.0.1-stable (as this writing). Download the .zip file pdi-ce-5.0.1.A-stable.zip.

On your computer, It is easier to put al versions of the Pentaho Suite on a folder, so create a Pentaho5 directory under your home directory and unzip the downloaded file in it (right click and select ‘unzip here’). You should now have a /Pentaho5/data-integration directory with the ETL files.

To test the application:

  • On linux:
    Open a terminal window, change to the the app folder, something like
    cd Pentaho5/data-integration
    and then type
    ./spoon.sh
  • On Windows:
    Use the File Navigator to get to your ETL files, like C:\Pentaho5\ Double click on spoon.bat.

There are a lot of ETL samples in the directory Pentaho5/data-integration/samples/transformations/ from reading text files, sort group, write to database tables. For example ‘Fixed Input…‘ reads a file ‘Textfile input – fixed length sample data.txt‘. To see it working, select the first icon (Fixed..) and hit the F10 key. You will see its contest if you click on launch.

You can also browse and install plugins if you click on help in the toolbar and select marketplace.

Exit the app.

Pentaho Report Designer

The process is similar for the Report Designer. The sourceforge folder is here. The latest version is 5.0.1-stable. You can download prd-ce-5.0.1-stable.zip directly.

Download the file and unzip it on your Pentaho5 folder (right click an unzip it). You should now have a /Pentaho5/report-designer directory.

To test the application;

  • On linux:
    Open a terminal window, change to the the app folder, something like
    cd Pentaho5/report-designer
    and then type
    ./report-designer.sh
  • On Windows:
    Double click on report-designer.bat.

You will notice the desinger elemnets tools on your left, and all cramped on your right the Structure Tab of your report and its elements in the bottom and Data Tab to add your data sources and parameters.

There are a lot of samples in the directory Pentaho5/report-designer/samples from invoice status, sales sumaries, charts or advanced html. The inventory.prpt on the operational reports folder is a nice example. You can execute them by clicking on the green ‘play’ icon in the toolbar.

Exit the app.

Pentaho Metadata

The process of installation is the same as in the previous examples. The sourceforge folder is here. The latest version is 5.0.1-stable. You can download pme-ce-5.0.1-stable.zip directly.

Download the file and unzip it on your Pentaho5 folder (right click an unzip it). You should now have a /Pentaho5/metadata-editor directory.

To test the application;

  • On linux:
    Open a terminal window, change to the the app folder, something like
    cd Pentaho5/metadata-editor
    and then type
    ./metadata-editor.sh
  • On Windows:
    Double click on metadata-editor.bat.

You can browse the steel wheels fisical (database) model, and how it is used then in the business model for browsing from tools like the WAQR -now deprecated- or report designer.

Exit the app.

Pentaho BI Server

The sourceforge folder is here. The latest version is 5.0.1-stable. You can download biserver-ce-5.0.1-stable.zip directly.

Download the file and unzip it on your Pentaho5 folder (right click an unzip it). You should now have a /Pentaho5/biserver-ce. The ce stands for community edition that is the open source or ‘basic’ version or without the much hyped capabilities on the new marketing and videos. Dont worry its pretty capable software.

You must run the Tomcat server to allow it to ‘deploy’ its web applications. It will uncompress folders and directories and set the server to a ‘localhost’ configuration.

  • On linux:
    Open a terminal window, change to the the app folder, something like
    cd Pentaho5/biserver
    and then type
    ./start-pentaho.sh
  • On Windows:
    Double click on start-pentaho.bat.

The startup will take a few minutes. Remember that the server has versions of all the sofware installed, the web server and a database created and running in memory to allow the demo run, so be patient.

To access the login page open your browser (any moder browser is reorted to work ok) and type the URL:

 http://localhost/pentaho:8080

Browse the files, in the public section you’ll find the standar report, olap report and dashboardof the Pentaho Suite.a

You can also add very valuable plugins like saiku o the ctools dashboard web tools (cdf, cde, cda) or waqr using the marketplace option.

To ‘close’ the web server you will need to execute the stop-pentaho script.

Ending comments

Thats it, those are the tools updated by pentaho on 2013. Open source tools like Saiku Reporting are not being ported. Aggregation o Mondrian Schema Editor are being updated. Tools like the Studio Designer are deprecated.
Remember: “Change is good. You go first.” 🙂

Installing the database and the server as services/startup deamons is a nice option it will be addressed on future post (you can access old articles to do that on this blog). The next post will be on customization of the bi server and then on a database (MySql) redirection, notes on the new repository and then a post on program migration, so stay tunned.

You can visit the main Index.

Complete Pentaho Installation on Ubuntu, Part 6

Install PDI/Kettle and Agile PDI in a Development Environment

This is where you’ll install and play with one of the most interesting, well crafted, user friendly and enjoyable applications I have ever used.

It is also the heart of the Pentaho BI suite as the tool to Extract, Transform & Load (ETL), process data and execute jobs. It’s name was Kettle, its former name, now is known as PDI for Pentaho Data Integration, and also Spoon, it’s executable file name.

We’ll install PDI as a desktop development tool.

1. Get and Install PDI.

Go to Pentaho files in sourceforge here and download the latest stable release (version 4.2 should be up by july 2011. Its a remarkable new version, check the improvements).

Double click on the file and extract it’s content into a new folder:

/Pentaho/data-integration

You can delete the .bat files and make all the .sh files, specially the spoon.sh as an executable file (right click on permisions tab). And in command terminal start it with:

./spoon.sh

Close the window dialog that offers to open a repository. You should be in the PDI, a development environment:

Pentaho Data Integration

PDI

Note: You don’t have to configure anything or add drivers for common databases.  I told you, this open source application is the result of a great community and is a very well crafted product, you’ll see.

[Edit]
Ups, There is a glitch on the interface and the new ubuntu 11.04 scrollbars as they don’t work or let you put steps into the canvas. The solution I took is disable them as shown on the PDI forums here.

2. Meet the Application

There are several resources you should browse and revisit them as you familiarize yourself with the concepts of ETL and this tool:

  1. 18 slides presentation by the project founder, manager and lead developer of PDI, explaining its capabilities. Slide #11 ‘use-cases’ list some of its uses. [video].
  2. Check a guide that comes with your download at
    /Pentaho/data-integration/docs/English/getting_started_with_pdi.pdf
  3. A detailed spoon user guide in the Pentaho wiki.
  4. Some videos on you-tube explain specific extractions that may seem too complicated at first and sometimes are about the process more than each component, but check those vblogs: mattcasters, BIOpenSource, DiddySteiner, ETLTools, , fechever75, LaboratoriosSIUCV, opensourcebi and more.
  5. The most important cookbook you have are the sample files in your disk at:
    /Pentaho/data-integration/samples/transformations/
  6. Also get this old guide, it’s no longer distributed with the PDI but it was very usefull to me as it lists the main ‘steps’.
  7. Continue with more articles from the Pentaho wiki.

3. Define a DB Connection

In the left panel, right click on Transformations. Click new. Now again on the left panel select the view tab.

Left Panel

Right click on the Database connections node, select new and fill the dialog with your data:

Here you can see the values for our MySql database and the result of the test.

Close the dialogs and right click on the left panel in the mysql connection, select share. Now when you save the transformation the connection data will be available for other extractions.

You have to save the transformation, a .ktr [xml] file. Create a folder for your transformations, that will make it easy to sync or backup them up:

/Pentaho/data-integration/development/

Note: An Oracle 10g connection using jndi (no net client) looks like this:

Conection to Oracle 10g

Conection to Oracle 10g

4. Execute a Transformation

A nice way to start learning about the PDI, ETL and datawarehousing is by opening the samples folder and check the components names and its notes, those are self explanatory. If you double click on them you will see the parameters that specify each behavior. If you right click on them you can select options to see the description, input or output fields, the text description -you should document the intention of the activity in here-, preview a sample run, etc.

Once you have reviewed some transformations I recommend one to start, that is create an object fundamental to multidimensional analisys: the time dimension. This is a table with a row for each day in the calendar, has columns showing special attributes like months, quarters, years, weekends So its easy to select dates based on those columns and then select the values in the fact table just with the indexed records which contain those needed dates.

A nice specification for a time dimension table is listed in this post of Nicholas Goodman. His blog has very interesting information too.

Check this pages, download the examples and run them in your environment.

  • Kettle Tip: Using java locales for a Date Dimension – Sept 2007 (link).
    In this post, Roland Bouman, shows a simplified extraction and then proceeds to show how to connect to a database, use a SQL to create the table and execute it.
    Then it explains three more steps to generate the data.
    In here you will see the more difficult part of using PDI, the javascript step.
  • HowTo: Create a date dimension with PDI (Kettle) – March 2010 (link)
    Geschrieben Von adds more characteristics for a day and uses more PDI steps to obtain them: calculators, filters (select), lookups. This will be version 2.0 of the last example.
  • Building a detailed Date Dimension with Pentaho Kettle – Sept 2010 (link)
    In here, Slawomir Chodnicki explains briefly the desing considerations in his design. One important thing here is how he introduces the concept of updating your data on dimensions jus by re-run the transformation, this is something we must get used to. It is important if your job crashes and you have to rebuid the process or being capable to continue from a given point.
    The file contains some erros on the java scrpts steps -some variables are not defined but referenced-, it is an oportunity to see the debugger messages of PDI.

5. Working without a Repository

If you are working with a developer team you shoud create a repository. Its simple, just click on new button and with a user with DB privileges on MySql, create the database.

Then you will get a single area for your programs and avoid versioning and syncing problems, your connection also get stored, etc. But if you are one or two people (normal for a pilot project) it is best to avoid using one. You can just synck and back up your program folders. Also you don’t need to change the normal way the BI server seeks programs.

The repository really needs its special post.

Ok, if you can’t wait, read this from

6. PDI Agile Plugin

[Edit July 2011:] On version 4.2 RC1, and on, the plug-in is already included.

Head over here and download de 1.0 version for the Modeling and Visualization Plugin, aka the Analyzer Plugin for LucidEra Cleariew before being bought by Pentaho.

It adds prototyping, datasource visualization and modeling creation on data snapshots, so you drag and drop and save your work.

Unzip the content of pmv-1.0.2-stable.zip into:

/Pentaho/data-integration/plugins/spoon

It will create a folder named agile-bi.

Start spoon and you will notice three butons on the top right. That’s normal view, model and visualize. Check this video from .

I was confuse about this note but the code is open source except the analyzer presentation layer and its available here.

7. Additional articles:

These are medium and complex topics:

  • An example of the ‘generate documentation’ step. So you can add descriptions to your extractions and make use of this new step: here
  • Error handling. Since version 3.8 error flow is available from every step here is how to use it properly.
  • Handling of configuration and variables: here
  • An introduction to regular expresions. A must read for the javascript step: here
  • An impresive plugin ‘excel output’ (more complex but more impresive than the default step). So you can generate formated reports: here.
    Note: on 4.2, the excel step is integrated in the PDI.
  • Connect PDI to SAP BI as a web service here.
  • Good chapter book sample “Pentaho Data Integration 4 Cookbook”: A transformation, A report from PDI data, PDI jobs from the BI Server process/ PUC, PUC-PDI-CDA, dashboard and data from PDI.
  • In the javascript step you can see very usefull sample code for each function. DEinspanjer explains with more detail than this: On left panel -> open Transform Functions -> open Date Functions -> right click on dateDiff -> select Sample.

8. Ruby Plugin

[Edit August 31, 2011]
Its news to me that there is another way to do the scripting than the javascript step. Now you can do your process in ‘Ruby’ with another plugin that you just need to unzip on the plugin directory: ruby-scripting-plugin.