Complete Pentaho Installation on Ubuntu, Part 12

The Pentaho Aggregation Designer (PAD) is a tool that comes from the Mondrian project. Its purpose is to generate SQL queries to create summarization tables, the instructions for data aggregation and the modification of the mondrian schema so the engine can use them.

The engine can check that precalculated data is available based on the names of fields used in keys and and report them speeding the display of results. The cost is that the cube construction will take considerably longer.

Install

  1. Download it from the mondrian sourceforge page (1.2.2-stable).
  2. Unzip it’s content to /pentaho/aggregation-designer
  3. Open a terminal, navigate to that folder and make the .sh files excecutables:
    $ chmod +x *.sh

To start using it

In that folder and type:

$ ./startaggregationdesigner.sh
  1. Click Configure and fill the parameters as we have done before: SampleData name, localhost, sampledata, pentaho_user, password. Click on test and ok.
  2. Click on Mondrian schema and locate steelwheels.mondrian.xml in /Pentaho/biserver-ce/pentaho-solutions/steel-wheels/analysis/, click Apply.
  3. Select Steelwheels model, and click connect.
    PAD will check for primary keys, nulls on foreing keys –which should not exist in a correctly done ETL process-. Click Ok.
  4. Now click advisor and then recommend in the dialog window.
  5. If you click on on the right bottom panel, the columns used will be displayed on the left bottom panel.
    PAD Screen

    PAD Screen

    You can select wich one to generate  as it shows a graph with cost to build (number of records) against benefit (time to get the result).

  6. Click export to open a dialog window:
    Click on Create Aggregates: export to Save it to a file like create.sql
    Click on Populate Aggregates: export to a file like populate.sql
    Click on Publish Mondrian Schema: export to a file like schema.xml
  7. Save your work to a file like PAD_workspace and exit
  8. Now you can check and modify the scripts. Maybe adding the TRUNCATE SQL instructions to the tables in the populate file.
    You can automate running them them with an .xaction sequence and a PDI job.

Reading Material

  • Julyan Hyde blog post about the 2.0 BI release –and PAD here
  • Mondrian documentation here
  • User Guide 1.0 PDF y 1.2 scribd
  • If you use Lucid DB: heres an article to use with a plugin LucidDbAggregateDesigner
  • In 2008, Chris Webb made an an accurate prediction about mondrian column databases (LucidDB) and PAD here
Advertisements