Pentaho’s BI Server or BA platform allows you to access business data in the form of dashboards, reports or OLAP cubes via a convient web interface. Additionally it provides an interface to administer your BI setup and schedule processes. The aim of this tutorial is to illustrate how to get started with the BI Server and for that purpose I am going to use a small and artifical data set – as otherwise I would have to get deeper into further technologies – so I can keep this text lean. I am going to elaborate on Mondrian schemas, data warehouse design, MDX and further related concepts in separate articles. When you are through with this text and still hungry, make sure you check out “Mondrian Schema for OLAP Cube Definition ft. Google Analytics and Saiku” for a more advanced use case.
Setting Up the BI Server
The great thing about Penthao BI Server is that it is very very easy to setup. All you have to do is to …
- download the most recent release
- extract it
- set PENTAHO_JAVA_HOME to the root of your Java installation
- make start-pentaho.sh and stop-pentaho.sh executable, if you are working on Linux
That’s about it. Now you just start the server by executing start-pentaho.(sh|bat) and access it via your browser at http://localhost:8080/ where you log in with the preconfigured admin profile using admin:password. What you are looking at now is the Home section of the Pentaho User Console.
The Game Plan
Okay what we are now going to do is access data stored in a MySQL (fact) table and display it in the browser as a chart. For this purpose we are going to make use of a number of technologies.
- MySQL: stores the data
- Mondrian: makes MySQL data accessible as an OLAP cube via MDX
- Pentaho BI Server: hosts Mondrian and Saiku
- Saiku-plugin: makes cubed data accessible via web-based GUI
MySQL Data Set and Driver
This is the little table I used for my example and its data:
CREATE TABLE `testtable` ( `date` int(11) NOT NULL, `key1` char(1) NOT NULL, `key2` char(2) NOT NULL, `value` int(11) DEFAULT NULL, PRIMARY KEY (`date`,`key1`,`key2`) );
date, key1, key2, value 20130501, 'a', 'x', 9 20130501, 'a', 'y', 3 20130501, 'b', 'x', 5 20130501, 'b', 'y', 4 20130502, 'a', 'x', 7 20130502, 'a', 'y', 3 20130502, 'b', 'x', 5 20130502, 'b', 'y', 7 20130503, 'a', 'x', 8 20130503, 'a', 'y', 3 20130503, 'b', 'x', 2 20130503, 'b', 'y', 1
Connect MySQL with BI Server
The BI Server needs a MySQL JDBC driver to enable it to talk to MySQL. The Server is shipped with a driver but feel free to replace it with the most recent version. Its supposed location is [BI-Server Home]/tomcat/lib.
Now we have to hook MySQL up with the Server. For that purpose we choose “Manage Data Sources …” from the “File” menu. There we click on “+” and choose “JDBC”. Within the “Database Connection” dialog you enter the respective MySQL database connection parameters and when you’re done “Test” the connection.
Installing the Saiku Plugin
The graphical interface for accessing the data is done through the Saiku Analytics plugin. To install it enter the “Marketplace” by choosing the respective section from the nameless section menu located at the top left corner of the User Console.
Could it be more awesome? No copying around jar-files in shady sub-folders and editing obscure configuration files. Just click “install” and restart the BI Server – execute the stop and the start script.
Registering our New Data Cube
Finally we can tell the BI Server about our new data source which we would like to access as an OLAP cube. For this purpose we need to create an XML file keeping the schema definiton of the cube. This Mondrian schema specifies the structure of our data source. Whether a column is supposed to be used as a dimension or a measure, what type it is and how we would like to aggregate the measure – all that stuff is taken care of in that file. So, let’s create – anywhere you want – a file named testcube.xml with the following content:
<Schema name="testdb"> <Cube name="Test Cube"> <Table name="testtable"> </Table> <Dimension name="Date"> <Hierarchy hasAll="true"> <Level name="Date" column="date" type="Integer"/> </Hierarchy> </Dimension> <Dimension name="Key 1"> <Hierarchy hasAll="true"> <Level name="Key 1" column="key1" type="String"/> </Hierarchy> </Dimension> <Dimension name="Key 2"> <Hierarchy hasAll="true"> <Level name="Key 2" column="key2" type="String"/> </Hierarchy> </Dimension> <Measure name="Value" column="value" aggregator="sum"/> </Cube> </Schema>
The schema above is a Mondrian schema of version 3. Mondrian is already available as version 4 and offering substantial improvements and also ships with a new version of its schema defintion. But the current release of Pentaho BI Server 5 is not yet compatible to Mondrian 4 and shipped with Mondrian 3.6.
Let’s have a look!
And that’s about it already. Now we open “Saiku Analytics” via “File”/”New” and refresh the available cubes by clicking this icon featuring two green circular arrows. From the menu next to it we should now be able to select our “Test Cube” and start playing around with it.
(original article published on www.joyofdata.de)