Install and set up the Pentaho server
Applications like Product information management (PIM) in censhare exchange data with external systems. The data exchange requires the aggregation and transformation of data. For this puspose, the censhare Data integration module must be configured with a third-party ETL Extract, Transform, Load tool. This documentation describes the setup of the ETL solution Pentaho PDI CE for this purpose.
Legal note
Pentaho PDI (Community Edition) is a service tool provided by Hitachi Vantara that needs to be installed by Customer itself as required for data exchange with the censhare software. Any installation and configuration of this tool is therefore at Customer’s own risk and subject to separate license terms. The use of the tool may result in additional costs. censhare does not have any influence on incurring costs or then applicable license terms and shall therefore not be held responsible.
Target groups
Solution developers
Purpose
The data aggregation and export from censhare to exchange formats like CSV requires the installation of a Pentaho server. Pentaho is a data integration and transformation software. It connects to the censhare application server via a REST API call. This article describes the setup and configuration of the server.
Context
Applications like Product information management (PIM) in censhare exchange data with external systems. The data exchange requires the aggregation and transformation of data. For this purpose, the censhare Data integration module must be configured with a third-party ETL (Extract, Transform, Load) tool. This documentation describes the setup of the ETL solution Pentaho PDI CE for this purpose.
Prerequisites
To configure the transformations and jobs for the data export and import requires the respective ETL and development skills.
Introduction
Pentaho PDI is a business intelligence software for data integration, reporting and analytics from Hitachi Vantara. It offers visual tools for ETL (Extract, Transform, Load) processes. censhare integrates with the community edition of Pentaho PDI in order to aggregate, transform and output product data from censhare PIM into exchange formats like Excel or CSV.
The Pentaho PDI CE package includes a web server that allows you to run transformations and jobs remotely. The package consists of the Carte web server and the Kettle ETL engine. It must be installed on the same instance as your censhare application server. The communication between the censhare application server and the Pentaho server is executed through a REST API via HTTP. In this version, censhare only supports the method executeTrans provided by Pentaho. The data output requires a transformation file and optional transformation parameters. For more information on these read the article Data mapping and transformations with the PIM connector.
About Pentaho data integration (PDI)
censhare supports the version PDI CE 8.0.0.0-28 or newer. You can download the latest version of PDI CE from https://sourceforge.net/projects/pentaho. The documentation for version 8.0 can be found here.
The Pentaho data integration is a software that offers a set of tools for visual design ETL (extract, transform, load) transformations and jobs. The ETL engine of Pentaho is called Kettle. The PDI CE package includes the Carte web server. It allows you to execute the transformations and jobs remotely. censhare connects to the Carte web server via HTTP and a REST API. The package (Carte web server and Kettle ETL engine must be installed on the same host as the censhare application server. This is required in order to access input, output and transformation files from both applications (censhare and Pentaho).
Installing the Pentaho server
The Pentaho server can be installed as described below on all supported Linux platforms, except Oracle Solaris. If you want to install Pentaho on an Oracle Solaris platform, please contact our Service Desk.
To install Pentaho download the ZIP archive from https://sourceforge.net/projects/pentaho/ and proceed as follows:
Extract the ZIP archive into the /opt/ directory.
Add the following carte.service to the /usr/lib/systemd/system/ directory:
BASH[Unit]Description=Pentaho Data Integration [Service] Type=simple User=corpusGroup=corpus Environment=PDICONFIG='./pwd/carte-config-master-9090.xml' WorkingDirectory=/opt/data-integration/ ExecStart=/opt/data-integration/carte.sh $PDICONFIG [Install] WantedBy=multi-user.target
Open a terminal window and set the required permissions for the carte.service:
BASHchmod 644 /usr/lib/systemd/system/carte.service
Enable the service with the command:
BASHsystemctl enable /usr/lib/systemd/system/carte.service
Add the following carte-config-master-9090.xml configuration file to the /opt/data-integration/pwd/ directory:
XML<slave_config> <slaveserver> <name>master</name> <hostname>localhost</hostname> <port>9090</port><master>Y</master> </slaveserver> </slave_config>
Start a local Carte (web server) instance on port 9090 or another free port number. Do this by executing the following command from the installation directory (this starts the Pentaho application, too):
BASHsystemctl start carte.service
The command shown above refers to macOS or Linux machines. For other OS use the respective commands.
Check that the Carte web server is running. To do this enter the URL http://localhost:9090/kettle/status/?xml=Y in your web browser and log-in with user "cluster" and password "cluster".
The server should respond with the following status message (values inside the tags may be different):
XML<serverstatus> <statusdesc>Online</statusdesc> <memory_free>888385208</memory_free> <memory_total>1257242624</memory_total> <cpu_cores>8</cpu_cores> <cpu_process_time>534702631000</cpu_process_time> <uptime>174627855</uptime> <thread_count>99</thread_count> <load_avg>1.9580078125</load_avg> <os_name>Mac OS X</os_name> <os_version>10.13.3</os_version> <os_arch>x86_64</os_arch> <transstatuslist></transstatuslist> <jobstatuslist></jobstatuslist> </serverstatus>
Configuring the Pentaho interface in censhare
In the censhare Admin Client, you have to configure the Pentaho interface. It enables the communication between the censhare application server and Pentaho. Proceed as follows:
Go to the directory Configuration/Modules/Data integration/Pentaho interface and open the entry Pentaho (Preferences).
In the dialog window, open the configuration file by clicking the Edit XML file button. If you follow the default configuration as described above, nothing needs to be changed here:
XML<settings> <setting id="default" host="localhost" port="9090" pentahoUser="cluster" pentahoPassword="cluster" isHttps="false" logLevel="DEBUG"> <repository repositoryName="css-pentaho-repo" repositoryUser="cluster" repositoryPassword="cluster" /> </setting> </settings>
Check the port number in the default configuration - it must be the same the port as the Carte web server is using.
If you use another port than the default, enter the correct value in the port attribute. By clicking OK the changes are saved to a custom configuration file. The custom configuration is indicated in the Pentaho interface directory by a red flag.
Update the server configuration and - if necessary - synchronize the remote servers in your system. The custom configuration is now enabled.
Result
The Pentaho server and the censhare Pentaho interface are now installed and running on your system.
Next steps
Configure the Server actions for censhare Client and censhare Web. These actions allow users to execute a product data export with Pentaho.