Bullet Graph Dashboard: PDI-MDX-SQL DataSource + BI Server + CDE

To Do

The idea to build this dashboard comes from joining Stephen Few’s spec [pdf], an excelent implementation from Slawomir Chodnicki and finding out that the C*Tools (Webdetails dashboard project) has published a Bullet Graph widget from the Protovis Library that is easier to use and customize.

Dashboard PDI Final

Final Dashboard

Components

We’ll need:

  1. CDE Editor. Install or update to the latest version [Instructions]
    Used to create and edit the parameter, datasource and mappings to the graph.
  2. Pentaho PDI/Kettle ETL tool [Download].
    Used for data retrieval by selected parameters, normalization and PKI process.
  3. The Pentaho BI Sever
    To excecute the dashboard via its menu and see the html result.

Building The Layout

Lets start a new CDE dashboard by a click on the CDE icon in the top toolbar: CDE icon

Header Rows

The CDE Editor will open with ‘Layout’ selected. This is the asociated toolbar:
cde layout toolbar

  • Click on the row icon (=)
    Select Row
    Set:

    • Name=header
    • BackgroundColor=99a5de
      (in the bottom text field, close clicking on the color ball)
    • In the ‘Corners’ attribute. Use the down-arrow to scroll to Round
  • Click on the row icon (=) three more times to have a total of 4 rows.
    Be carefull, the toolbar is dynamic and the icons will change.
  • On the first and second rows add one ‘Html’ element (<>)
  • On the third row, set Name=ParamRow
  • Type in the first html type a title:
    <h2>Sales Dashboard By Product Line</h2>
  • On the second Html type:
    <div class="topRow1">Sales YTD</div>
    <div class="topRow2">Actual amount (horizontal bar) and Previuos Year (blue mark)</div>
    <hr class="topRowHR"/>

It should look like this.

Dash 00

Save in your development folder with a meaningfull ttitle (used by the BI menu system)  and click Preview to see the headers. Close the dialog.

Parameter Row

  • The third row will need four columns.
  • The first and the third column will have also an Html element.
  • Put in the first one the text Year, and in next Month.
  • Select the second column and look for the name attribute set it to nYear (node for year select).
    Name the forth one nMonth.

Result Row

  • Add two columns
  • Name the first one nResultGraph
    and the other nResultTable
  • Set for each column span size=12 (there’s a total of 24 columns in the template)

Save the final layout, that should look like this.

Dash n

Important: If the message “Dashboard saved successfully” doesn’t appear, check your mofifications for errors. If you can’t fix it, exit and resume from the latest point saved.
To edit a dashboard, select it on the browse panel in the BI Server. Right click on it and then select Edit in the context menu.

Building Datasources and its Widgets

Three types of datasources:

  1. Two SQL queries are needed for Year and Month.
  2. The table will be  a simple MDX.
  3. The Graph will need additional columns for ranges, we’ll use an ETL transformation for this one.

SQL

Click on Data Sources ‘button’ in your top-right. The left side of the CDE will list all the connections the CDE can use to get data.

  • Click on SQL Queries, then on sql over sqlJndi
    The group SQL Queries will be added
  • Set:
    name=dsYear
    Jndi=SampleData
    query=
SELECT DISTINCT YEAR_ID FROM DIM_TIME ORDER BY YEAR_ID DESC

Click on Components ‘button’ in your top-right.

  • We need a parameter to hold the Year selection
    • Click on the Generic button on the left list
    • Click now on Simple parameter
    • Set Name=pYear
      Set de default to Property value=2005
    • Save
  • Drop Down
    • Click on Selects button on the left list
    • Click now on Select Component
    • Set Name=csYear
      Parameter=pYear (the one defined above)
      HtmlObject=nYear (attach point, that is the column we named that way in the layout)
      Datasource=dsYear
    • Save
  • Click Preview. Check the select box. Close the preview dialog.

For the Month SQL, the set up is a little diferent as we need to set up a listener for changes on the Year select box and then pass that parameter to the Component and then the SQL datasource.

  • Click on Data Sources
    • Click SQL Queries, then sql over sqlJndi.
    • Set:
      name=dsMonth
      Jndi=SampleData
      Click on Parameters, click on the Add Button.
      in the dialog: Name=pYear, value=2005, Type=String
      query=

      SELECT DISTINCT MONTH_NAME
       FROM DIM_TIME
       WHERE YEAR_ID=${pYear}  ORDER BY MONTH_ID
  • Click on Components
    • Click on the Generic button
    • Click on Simple parameter
    • Set Name=pMonth
      Set de default to Property value=May
    • Click on Selects button on the left list
    • Click now on Select Component
    • Set Name=csMonth
      Parameter=pMonth
      HtmlObject=nMonth
      Click on Listeners, on the select arrow choose pYear
      Click on Parameters, click on the Add button,
      in the dialog set Arg=pYear, Value=pYear (you can select with the “…” button)
      Datasource=dsMonth
    • Save

If you click Preview you should see something like this.

Preview 01

MDX

Let’s define the query against the ‘Steel Wheels Sales’ cube as our Datasource. There are some links at the bottom of the article about MDX and sumary functions.

  • Click on
    • Data Sources
    • MDX Queries
    • mdx over mondrianJndi.
  • Set:
    • name=dsTable
    • Mondrian squema=SteelWheels
      (jndi will be set to SampleData)
    • Query=
      with
      member [Measures].[Sales_Curr] as 'Sum(Ytd([Time].CurrentMember), [Measures].[Sales])'
      member [Measures].[Sales_Prev] as 'Sum(Ytd(ParallelPeriod([Time].CurrentMember,1)), [Measures].[Sales])'
      select
      {[Measures].[Sales_Prev], [Measures].[Sales_Curr]} ON COLUMNS,
      {[Product].Children} ON ROWS
      from [SteelWheelsSales]
      where [Time].[2005].[QTR2].[May]

And now the grid element to display our data.

  • Click on:
    • Components
    • Others
    • Table Component
  • Set:
    • Name=ctTable
    • HtmlObject=nResultTable
    • Datasource=dsTable
  • Save & Preview
  • Now click on Advanced Properties (below the Component button)
  • Click on:
    • Column Headers
      add button for: Sales, Previous, Actual
    • Column Formats
      add button for: %s, $%.2f , $%.2f
      Those are for: string (%s) and currency with two decimals ($%.2f)
  • Set:
    • Paginate=False
    • Show Filter=False
    • Sort Data=False
  • Save & Preview

For testing and format purposes the used query doesn’t change when the month or year are modified. Now we need to add listeners, the parameters in the component and the datasource to calculate the ‘quarter’ depending on the month.

Click on Components (you may be already there)

  • Click on:
    • (left) Generic button
    • Simple parameter
  • Set:
    • Name=pQtr
    • Property value=QTR2

Click on Data Sources

  • Click on:
    • (group triangle) MDX Queries
      select dsTable
    • Parameters
  • Add three lines
    • Name=pYear, Value=2005, Type=String
    • Name=pQtr, Value=QTR2, Type=String
    • Name=pMonth, Value=May, Type=String
  • Click on Query
    change last line from:

    where [Time].[2005].[QTR2].[May]

    To:

    where [Time].[${pYear}].[${pQtr}].[${pMonth}]
  • Save & Preview.
    You should see the same image than before. Close the dialog.

Click on Components

  • Click on:
    • (group triangle) Others
      select ctTable
    • Listeners
      Select in the drop down: pYear, pMonth. Click OK
    • Parameters
  • Add three lines:
    • Arg=pYear, Value=pYear
      (you can use the three dots (…) button to select  value)
    • Arg=pQtr, Value=pQr
    • Arg=pMonth, Value=pMonth
  • Click on:
    • Advanced Properties (at the top link on your right)
    • preExecution
  • Paste this javascript function to set Quarter based on month selected:
    • function Q (){
      var vQtr='';
      switch (pMonth) {
      case 'Jan':
      case 'Feb':
      case 'Mar':
      vQtr='QTR1'; break;
      case 'Apr':
      case 'May':
      case 'Jun':
      vQtr='QTR2'; break;
      case 'Jul':
      case 'Ago':
      case 'Sep':
      vQtr='QTR3'; break;
      case 'Oct':
      case 'Nov':
      case 'Dec':
      vQtr='QTR4'; break;
      }
      pQtr=vQtr;
      }
  • Save & Preview.
    Try changing the month to ‘Jan’. You should see something like this.Table Preview

Close the preview dialog.

ETL

In he last part we’ll use an ETL process and the Bullet Graph widget.

Start the PDI. Create a new transformation and save it in the demo/development folder with the name SalesDashboard.ktr.

This is the PDI transformation

Dashboard PDI

Dashboard PDI Transformation

  1. Prepare the default parameters for testing.
    On the menu click Edit ->Parameters
    Select Parameters Tab, Define pYear, pQtr, pMonth.
    Dash PDI 00
  2. Set the MDX
    Add Input->Mondrian Input
    Dashboard PDI 01
    Check the ‘Replace variable’  checkbox so the parameters are used.
    The query is:
with
member [Measures].[Sales_Curr] as 'Sum(Ytd([Time].CurrentMember), [Measures].[Sales])/1000'
member [Measures].[Sales_Prev] as 'Sum(Ytd(ParallelPeriod([Time].CurrentMember,1)), [Measures].[Sales])/1000'
select
{[Measures].[Sales_Prev], [Measures].[Sales_Curr]} ON COLUMNS,
{[Product].Children} ON ROWS
from [SteelWheelsSales]
where [Time].[${pYear}].[${pQtr}].[${pMonth}]
  • Rename columns
    Here we copy columns to new ones but set the new names. This is necesary if you want to remove the dynamic nature of column names in MDX results, for example when you use time in columns then you may get [Time].[2005].[QTR2].[May] or [Time].[2005].[QTR1].[Jan] then you better use ‘CurrentMonth’ to use in PDI steps.
    Add Scrpting->Modified Java Script
    Dashboard PDI 02
    The script is:

    var rowOut = getOutputRowMeta();
    newRow = createRowCopy(rowOut.size());
    var ProductLine = rowMeta.getString(newRow, 0);
    var Actual   = rowMeta.getNumber(newRow, 1);
    var Previous = rowMeta.getNumber(newRow, 2);
  • Get the maximum value for each column.
    Here max value for Actual is renamed to Max1 and Previous o Max2.
    Add Statistics->Group by
    Dashboard PDI 03
  • Calculate max values and two ranges.
    Add Scrpting->Modified Java Script
    This is an oversimplified ETL, with ranges set on maximum values for all rows. In any case you could use KPIs from tables and a lookup step.
    – maximum value from both columns
    – round it up
    – I set two marks one at 70% other at 90%, for grade of color in the graph bar.
    Dashboard PDI 04
    – The sample script use is:

    var Maximum=0.0;
    var Mark1=0.0;
    var Mark2=0.0;
    if ( (Max1!=null) || (Max2!=null) )
    if ( Max1 > Max2 )
    Maximum = Max1;
    else
    Maximum = Max2;
    if (Maximum!=0.0) {
    if (Maximum>=1.0) {
    Maximum = Maximum / 10.0;
    Maximum = ceil(Maximum) * 10.0;
    } else {
    Maximum = Maximum / 1.0;
    Maximum = ceil(Maximum) * 1.0;
    }
    Mark1=Maximum*0.7;
    Mark2=Maximum*0.9;
    }
  • Process measures
    Each row may need processing, if KPI is set for each measure then it should be calculated on this thread. Here we just add a subtitle string (k=thousand).
    Add Transform->Add constants
    Dashboard PDI 05
  • Remove excesive text from product line description
    The first colum has text like “[Product].[Ships]”, we can remove the fixed part with a regular expression to replace the unwanted and fixed text
    Add Transform->.Replace in String Step.
    Type in Search:  ^\[Product\]\.\[(.*)\]$
    Notice that no output field is specified so ProductLine will be used, that the dot and squared brackets are ‘escaped’ with backslash and that the replace string is (.*)=$1
    Dash PDI Regex
    For using this step: Reference1, Reference2
  • Join the data and its ranges.
    Add Joins->Join Rows (cartesian product)
    Dashboard PDI 06
    The important thing is to state the step from where is the data coming from, if you don’t do this you end up missing data or even complete rows: Add constants.
  • Sort as you like.
    Add Transform->Sort rows
    Dashboard PDI 07
  • Output
    Make available only the relevant columns:
    – Measure Title
    – Measure Subtitle
    – Value of Bar
    – Value of Mark
    – Min Range (darker shade in bar)
    – Max Range (lighter shade in bar)
    – Max value
    Add Transform->Select values
    Dashboard PDI 08

Selecting this last step and previewing the output you get.

Dashboard PDI Result

You can close the PDI.

Bullet Graph

Back to the CDE Editor.

Click on Data Sources

  • Click on:
    • MDXKettle Queries button
    • ketle over kettleTransFormFile
    • Select the ketle over kettleTransFormFile Component
  • Set
    • Name=dsBulletGraph
    • Kettle Transformation File=SalesDashboard.ktr
    • Click on Variables, a dialog will open
      add three lines for:
      – Arg=pYear
      – Arg=pQtr
      – Arg=pMonth
    • Click on Parameters, in the dialog add three lines:
      – Name=pYear, Value=2005, Type=String
      – Name=pQtr, Value=QTR2, Type=String
      – Name=pMonth, Value=May, Type=String
    • Click on Output Options, in the dialog add six lines:
      – Index=0, Index=1, Index=2, Index=3, Index=4
      Index=5, Index=6
    • Click on Columns, in the dialog add six lines:
      – Index=0, Name=Product Line
      – Index=1, Name=Subtitle
      – Index=2, Name=Actual
      – Index=3, Name=Previous
      – Index=4, Name=Mark1
      – Index=5, Name=Mark2
      – Index=6, Name=Maximum
    • Click on Query=OutputBullet
      (last activity)
    • Save CDE

Click on Components

  • Click on:
    • Charts
    • CCC Bullet Chart
      a group will be added
    • Select the new CCC Bullet Chart Component.
  • Set
    • Name=cgBulletGraph
    • Width=420
    • Height=220
    • Datasource=dsBulletGraph
    • Clear Title and Subtitle
    • Click Bullet Ranges
      click on the minus button (-) to delete each line
      Close the dialog
    • Click on Parameters, in the dialog add three lines:
      – Arg=pYear, value=pYear
      – Arg=pQtr, value=pQtr
      – Arg=pMonth, value=pMonth
    • Set HtmlObject=nResultGraph
    • Click on Listeners
      Select pYear, pMonth

Save and preview. We are almost finished

Dashboard PDI Almost

Fine Formatting

There are three formating options in the CDE

  1. Format options in CDE, either in the component advanced properties or html or column layout
  2. CSS class from the web page
  3. Some properties of the components

To work with the CSS or DOM elements we add a resource file:

  • Create a text file and type:
    That is mainly header fonts and colors, parameter style and table formatting.

    #header { margin-top:1em; }
    h2 { font-size:1.4em; padding-top:7px; padding-left:1em; height:1.6em; }
    .topRow1 { font-size:1.2em; font-weight:700; margin-left:1em; }
    .topRow2 { font-size:1em; margin-left:1em; }
    hr.topRowHR { height:2px; width:900px; color:red; }
    #ParamRow { font-size:0.9em; vertical-align:bottom; margin-bottom:2em; }
    #ParamRow select { font-size:0.9em; vertical-align:top; }
    #ParamRow input { font-size:0.9em; vertical-align:top; }
    #nResultTable { margin-top:2em; }
    table tr.even, table tr.odd{ background-color: white; }
    table thead th { border-bottom: 1px solid #CCC; background-color: #fff; }
    table tbody td.number, th.number{ text-align: right; }
    table tbody td.string, th.string{ text-align: left; }
    table tfooter td { background-color: white; }
    .ui-widget-header { background: white; color: white; border-bottom:0px none; border-top:1px grey solid; border-left:0px none; border-right:0px none;}

    Save it in the demo/development folder as Dashboard.css

  • On the CDE, click on Layout option
  • Add a css resource file, select from the options:
    css
    external
  • Set:
    name=cssDashboard
    Resource file=Dasboard.css

This will be the result

Dash Final 1

We can modify the Graph if we access its properties, in this case the ‘Extension Points’. Lets change the Marker shape and color.

Click on Components

  • Click on the triangle of the Group Charts
  • CCC Bullet Chart
  • click on Advanced Properties.
  • Click on Extension points a dialog will open.
    Add Three lines. Fill with:

    • Arg:bulletMarker_shape  Value:bar
    • Arg:bulletMarker_lineWidth  Value:3
    • Arg:bulletMarker_strokeStyle  Value:#2e559c

Note:
Check posible values for Extension Points at CCC (Bullet Chart) and Protovis sites.
In the CCC (Bullet Chart) page check the keyword to access: base, mark, measure, title, rule, range. In the rest of the document and examples look for the second part: font, shape. Join them with an undescore: bulletMarker_lineWidth.
Or if you want to change the color of the inner bar, you could try:
Arg: bulletMeasure_fillStyle, Color: green or #0F0

You can use other properties to change the appearence, for example the tooltip to show the value of a value can ve modified.

Click in the Advanced Properties of the Graph, in Value format:

function f(value){return ('$'+sprintf("%.1f", value)+'m')}

The final product should be the initial dashboard image.

.

Additional Links

  • Bullet Chart on webdetails: here.
  • Class, Field, Method and Extension Points on webdetails: here.
  • MDX function YTD youtube video by phiintegration.
  • MDX Time Series Functions Part I. Series by William Pearson.
  • Another fine example about building CDE dasboards OSBI.
    Its in French, but the code and diagrams are easy to understand.
Advertisements

Complete Pentaho Installation on Ubuntu, Part 6

Install PDI/Kettle and Agile PDI in a Development Environment

This is where you’ll install and play with one of the most interesting, well crafted, user friendly and enjoyable applications I have ever used.

It is also the heart of the Pentaho BI suite as the tool to Extract, Transform & Load (ETL), process data and execute jobs. It’s name was Kettle, its former name, now is known as PDI for Pentaho Data Integration, and also Spoon, it’s executable file name.

We’ll install PDI as a desktop development tool.

1. Get and Install PDI.

Go to Pentaho files in sourceforge here and download the latest stable release (version 4.2 should be up by july 2011. Its a remarkable new version, check the improvements).

Double click on the file and extract it’s content into a new folder:

/Pentaho/data-integration

You can delete the .bat files and make all the .sh files, specially the spoon.sh as an executable file (right click on permisions tab). And in command terminal start it with:

./spoon.sh

Close the window dialog that offers to open a repository. You should be in the PDI, a development environment:

Pentaho Data Integration

PDI

Note: You don’t have to configure anything or add drivers for common databases.  I told you, this open source application is the result of a great community and is a very well crafted product, you’ll see.

[Edit]
Ups, There is a glitch on the interface and the new ubuntu 11.04 scrollbars as they don’t work or let you put steps into the canvas. The solution I took is disable them as shown on the PDI forums here.

2. Meet the Application

There are several resources you should browse and revisit them as you familiarize yourself with the concepts of ETL and this tool:

  1. 18 slides presentation by the project founder, manager and lead developer of PDI, explaining its capabilities. Slide #11 ‘use-cases’ list some of its uses. [video].
  2. Check a guide that comes with your download at
    /Pentaho/data-integration/docs/English/getting_started_with_pdi.pdf
  3. A detailed spoon user guide in the Pentaho wiki.
  4. Some videos on you-tube explain specific extractions that may seem too complicated at first and sometimes are about the process more than each component, but check those vblogs: mattcasters, BIOpenSource, DiddySteiner, ETLTools, , fechever75, LaboratoriosSIUCV, opensourcebi and more.
  5. The most important cookbook you have are the sample files in your disk at:
    /Pentaho/data-integration/samples/transformations/
  6. Also get this old guide, it’s no longer distributed with the PDI but it was very usefull to me as it lists the main ‘steps’.
  7. Continue with more articles from the Pentaho wiki.

3. Define a DB Connection

In the left panel, right click on Transformations. Click new. Now again on the left panel select the view tab.

Left Panel

Right click on the Database connections node, select new and fill the dialog with your data:

Here you can see the values for our MySql database and the result of the test.

Close the dialogs and right click on the left panel in the mysql connection, select share. Now when you save the transformation the connection data will be available for other extractions.

You have to save the transformation, a .ktr [xml] file. Create a folder for your transformations, that will make it easy to sync or backup them up:

/Pentaho/data-integration/development/

Note: An Oracle 10g connection using jndi (no net client) looks like this:

Conection to Oracle 10g

Conection to Oracle 10g

4. Execute a Transformation

A nice way to start learning about the PDI, ETL and datawarehousing is by opening the samples folder and check the components names and its notes, those are self explanatory. If you double click on them you will see the parameters that specify each behavior. If you right click on them you can select options to see the description, input or output fields, the text description -you should document the intention of the activity in here-, preview a sample run, etc.

Once you have reviewed some transformations I recommend one to start, that is create an object fundamental to multidimensional analisys: the time dimension. This is a table with a row for each day in the calendar, has columns showing special attributes like months, quarters, years, weekends So its easy to select dates based on those columns and then select the values in the fact table just with the indexed records which contain those needed dates.

A nice specification for a time dimension table is listed in this post of Nicholas Goodman. His blog has very interesting information too.

Check this pages, download the examples and run them in your environment.

  • Kettle Tip: Using java locales for a Date Dimension – Sept 2007 (link).
    In this post, Roland Bouman, shows a simplified extraction and then proceeds to show how to connect to a database, use a SQL to create the table and execute it.
    Then it explains three more steps to generate the data.
    In here you will see the more difficult part of using PDI, the javascript step.
  • HowTo: Create a date dimension with PDI (Kettle) – March 2010 (link)
    Geschrieben Von adds more characteristics for a day and uses more PDI steps to obtain them: calculators, filters (select), lookups. This will be version 2.0 of the last example.
  • Building a detailed Date Dimension with Pentaho Kettle – Sept 2010 (link)
    In here, Slawomir Chodnicki explains briefly the desing considerations in his design. One important thing here is how he introduces the concept of updating your data on dimensions jus by re-run the transformation, this is something we must get used to. It is important if your job crashes and you have to rebuid the process or being capable to continue from a given point.
    The file contains some erros on the java scrpts steps -some variables are not defined but referenced-, it is an oportunity to see the debugger messages of PDI.

5. Working without a Repository

If you are working with a developer team you shoud create a repository. Its simple, just click on new button and with a user with DB privileges on MySql, create the database.

Then you will get a single area for your programs and avoid versioning and syncing problems, your connection also get stored, etc. But if you are one or two people (normal for a pilot project) it is best to avoid using one. You can just synck and back up your program folders. Also you don’t need to change the normal way the BI server seeks programs.

The repository really needs its special post.

Ok, if you can’t wait, read this from

6. PDI Agile Plugin

[Edit July 2011:] On version 4.2 RC1, and on, the plug-in is already included.

Head over here and download de 1.0 version for the Modeling and Visualization Plugin, aka the Analyzer Plugin for LucidEra Cleariew before being bought by Pentaho.

It adds prototyping, datasource visualization and modeling creation on data snapshots, so you drag and drop and save your work.

Unzip the content of pmv-1.0.2-stable.zip into:

/Pentaho/data-integration/plugins/spoon

It will create a folder named agile-bi.

Start spoon and you will notice three butons on the top right. That’s normal view, model and visualize. Check this video from .

I was confuse about this note but the code is open source except the analyzer presentation layer and its available here.

7. Additional articles:

These are medium and complex topics:

  • An example of the ‘generate documentation’ step. So you can add descriptions to your extractions and make use of this new step: here
  • Error handling. Since version 3.8 error flow is available from every step here is how to use it properly.
  • Handling of configuration and variables: here
  • An introduction to regular expresions. A must read for the javascript step: here
  • An impresive plugin ‘excel output’ (more complex but more impresive than the default step). So you can generate formated reports: here.
    Note: on 4.2, the excel step is integrated in the PDI.
  • Connect PDI to SAP BI as a web service here.
  • Good chapter book sample “Pentaho Data Integration 4 Cookbook”: A transformation, A report from PDI data, PDI jobs from the BI Server process/ PUC, PUC-PDI-CDA, dashboard and data from PDI.
  • In the javascript step you can see very usefull sample code for each function. DEinspanjer explains with more detail than this: On left panel -> open Transform Functions -> open Date Functions -> right click on dateDiff -> select Sample.

8. Ruby Plugin

[Edit August 31, 2011]
Its news to me that there is another way to do the scripting than the javascript step. Now you can do your process in ‘Ruby’ with another plugin that you just need to unzip on the plugin directory: ruby-scripting-plugin.