The Pentaho Aggregation Designer (PAD) is a tool that comes from the Mondrian project. Its purpose is to generate SQL queries to create summarization tables, the instructions for data aggregation and the modification of the mondrian schema so the engine can use them.
The engine can check that precalculated data is available based on the names of fields used in keys and and report them speeding the display of results. The cost is that the cube construction will take considerably longer.
- Download it from the mondrian sourceforge page (1.2.2-stable).
- Unzip it’s content to /pentaho/aggregation-designer
- Open a terminal, navigate to that folder and make the .sh files excecutables:
$ chmod +x *.sh
To start using it
In that folder and type:
- Click Configure and fill the parameters as we have done before: SampleData name, localhost, sampledata, pentaho_user, password. Click on test and ok.
- Click on Mondrian schema and locate steelwheels.mondrian.xml in /Pentaho/biserver-ce/pentaho-solutions/steel-wheels/analysis/, click Apply.
- Select Steelwheels model, and click connect.
PAD will check for primary keys, nulls on foreing keys –which should not exist in a correctly done ETL process-. Click Ok.
- Now click advisor and then recommend in the dialog window.
- If you click on on the right bottom panel, the columns used will be displayed on the left bottom panel.
You can select wich one to generate as it shows a graph with cost to build (number of records) against benefit (time to get the result).
- Click export to open a dialog window:
Click on Create Aggregates: export to Save it to a file like create.sql
Click on Populate Aggregates: export to a file like populate.sql
Click on Publish Mondrian Schema: export to a file like schema.xml
- Save your work to a file like PAD_workspace and exit
- Now you can check and modify the scripts. Maybe adding the TRUNCATE SQL instructions to the tables in the populate file.
You can automate running them them with an .xaction sequence and a PDI job.
- Julyan Hyde blog post about the 2.0 BI release –and PAD here
- Mondrian documentation here
- User Guide 1.0 PDF y 1.2 scribd
- If you use Lucid DB: heres an article to use with a plugin LucidDbAggregateDesigner
- In 2008, Chris Webb made an an accurate prediction about mondrian column databases (LucidDB) and PAD here
firstly, thanks for your posts, very useful
i’ve started creating cubes and using aggregation designer relatively recently,
and it is not clear to me some aspects, can u plz help with this questions?
how does pentaho knows wheter to use published pre-aggregation xml or not?
do we have to set it somewhere?
we can suppose that it is using this pre-aggregation, but not clear at what moment it calculate and refreshes tables?
how can we force pentaho to recalculate aggregated data?