Archive for September, 2010

Locally Maven-ize Pentaho Kettle and develop a Data Integration webapp with Eclipse (also integrated with container managed datasource)

24/09/2010

Install Kettle’s JAR into the local repository

Execute some INSTALL command line statements:

  • mvn install:install-file -DgroupId=pentaho.kettle -DartifactId=kettle-core -Dversion=4.0.0 -Dpackaging=jar -Dfile=C:\Pdi-ce-4.0.0-stable\data-integration\lib\kettle-core.jar -DgeneratePom=true
  • mvn install:install-file -DgroupId=pentaho.kettle -DartifactId=kettle-db -Dversion=4.0.0 -Dpackaging=jar -Dfile=C:\Pdi-ce-4.0.0-stable\data-integration\lib\kettle-db.jar  -DgeneratePom=true
  • mvn install:install-file -DgroupId=pentaho.kettle -DartifactId=kettle-engine -Dversion=4.0.0 -Dpackaging=jar -Dfile=C:\Pdi-ce-4.0.0-stable\data-integration\lib\kettle-engine.jar  -DgeneratePom=true
  • mvn install:install-file -DgroupId=pentaho.kettle -DartifactId=kettle-ui-swt -Dversion=4.0.0 -Dpackaging=jar -Dfile=C:\Pdi-ce-4.0.0-stable\data-integration\libext\pentaho\kettle-ui-swt.jar  -DgeneratePom=true
  • mvn install:install-file -DgroupId=pentaho.kettle -DartifactId=kettle-vfs -Dversion=4.0.0 -Dpackaging=jar -Dfile=C:\Pdi-ce-4.0.0-stable\data-integration\libext\pentaho\kettle-vfs-20091118.jar  -DgeneratePom=true
  • mvn install:install-file -DgroupId=pentaho -DartifactId=pentaho-libbase -Dversion=1.1.6 -Dpackaging=jar -Dfile=C:\Pdi-ce-4.0.0-stable\data-integration\libext\pentaho\libbase-1.1.6.jar  -DgeneratePom=true
  • mvn install:install-file -DgroupId=pentaho -DartifactId=pentaho-libformula -Dversion=1.1.7 -Dpackaging=jar -Dfile=C:\Pdi-ce-4.0.0-stable\data-integration\libext\pentaho\libformula-1.1.7.jar  -DgeneratePom=true
  • Some others may be needed depending on used libraries or transformation blocks

Modify the Kattle dependencies POM

Edit the %MAVEN_REPOSITORY%\pentaho\kettle\kettle-core\4.0.0\kettle-core-4.0.0.pom file adding dependencies element:

<?xml version=”1.0″ encoding=”utf-8″?>

<project xsi:schemaLocation=”http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd&#8221;

xmlns=”http://maven.apache.org/POM/4.0.0&#8243;

xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”&gt;

<modelVersion>4.0.0</modelVersion>

<groupId>pentaho.kettle</groupId>

<artifactId>kettle-core</artifactId>

<version>4.0.0</version>

<description>POM was created from install:install-file</description>

<dependencies>

<dependency>

<groupId>pentaho.kettle</groupId>

<artifactId>kettle-db</artifactId>

<version>4.0.0</version>

</dependency>

<dependency>

<groupId>pentaho.kettle</groupId>

<artifactId>kettle-vfs</artifactId>

<version>4.0.0</version>

</dependency>

<dependency>

<groupId>pentaho.kettle</groupId>

<artifactId>kettle-engine</artifactId>

<version>4.0.0</version>

</dependency>

<dependency>

<groupId>pentaho.kettle</groupId>

<artifactId>kettle-ui-swt</artifactId>

<version>4.0.0</version>

</dependency>

<groupId>commons-logging</groupId>

<artifactId>commons-logging</artifactId>

<version>1.1</version>

<dependency>

<groupId>javassist</groupId>

<artifactId>javassist</artifactId>

<version>3.4.GA</version>

</dependency>

<dependency>

<groupId>rhino</groupId>

<artifactId>js</artifactId>

<version>1.7R2</version>

</dependency>

<dependency>

<groupId>net.sourceforge.jexcelapi</groupId>

<artifactId>jxl</artifactId>

<version>2.6.10</version>

</dependency>

<dependency>

<groupId>pentaho</groupId>

<artifactId>pentaho-libbase</artifactId>

<version>1.1.6</version>

</dependency>

<dependency>

<groupId>pentaho</groupId>

<artifactId>pentaho-libformula</artifactId>

<version>1.1.7</version>

</dependency>

<dependency>

<groupId>net.sf.scannotation</groupId>

<artifactId>scannotation</artifactId>

<version>1.0.2</version>

</dependency>

<dependency>

<groupId>simple-jndi</groupId>

<artifactId>simple-jndi</artifactId>

<version>0.11.1</version>

<exclusions>

<exclusion>

<groupId>javax.sql</groupId>

<artifactId>jdbc-stdext</artifactId>

</exclusion>

</exclusions>

</dependency>

<dependency>

<groupId>javax.mail</groupId>

<artifactId>mail</artifactId>

<version>1.4</version>

</dependency>

</dependencies>

</project>

Some others dependencies may be needed depending on used libraries or transformation blocks, you can discover those executing the Transformations or the Jobs at the end of this guide.

Note: the JDBC-Stdext exclusion will prevent the “Unable to resolve artifact: required artifacts missing: javax.sql:jdbc-stdext:jar:2.0” error according to http://www.osjava.org/issues/browse/SJN-74.html.

Use JNDI reference within Spoon

To simulate the JNDI datasource availability in Spoon, as it happens in an application server, we need to write a proper jdbc.properties file within the pdi-ce-4.0.0-stable\data-integration\simple-jndi directory:

java:/comp/env/jdbc/NAME/type=javax.sql.DataSource

java:/comp/env/jdbc/NAME/driver=com.microsoft.sqlserver.jdbc.SQLServerDriver

java:/comp/env/jdbc/NAME/url=jdbc:sqlserver://host:1433;databaseName=name

java:/comp/env/jdbc/NAME/user=username

java:/comp/env/jdbc/NAME/password=password

After that you can use a JNDI database connection with JNDI name java:/comp/env/jdbc/NAME.

Note: remember to put the JAR with database drivers into the directory pdi-ce-4.0.0-stable\data-integration\libext\JDBC.

To explicitly set the jdbc.properties path, it’s possible to use the following VM argument to start Spoon modifying spoon.bat or spoon.sh:

-Djava.naming.factory.initial=”org.osjava.sj.Simple ContextFactory” -Dorg.osjava.sj.root=”C:/directory/simple-jndi” -Dorg.osjava.sj.delimiter=”/”

I suggest you to test that connection within Spoon interface.

Create a Data Integration webapp project with Eclipse

Launch Eclipse and create a Maven webapp project.

When you run Tomcat or other JEE container you need to define the KETTLE_PLUGIN_BASE_FOLDERS variable in “VM arguments” text area within the Run Configuration dialog box:

-DKETTLE_PLUGIN_BASE_FOLDERS=C:/pdi-ce-4.0.0-stable/data-integration/plugins

If you don’t do that, you will face a problem with the plugin loader: the KettleEnvironment initialization scans the Eclipse “plugins” directory instead of Kettle “plugins” directory searching for JAR this will cause waste of time due to the many and many jar available in the Eclipse directory.

More plugin directories are allowed using comma separated values: “C:/pdi-ce-4.0.0-stable/data-integration/plugins, C:/dir/plugins”.

An argument -Dorg.osjava.sj.root=”C:/directory/simple-jndi” is not necessary if you use KettleEnvironment.init(false) statement in initialization phase (see below).

Import Kettle libraries

Now import Kettle JARs within your webapp project will be easy, just add “kettle-core” dependency. If you don’t have “kettle-core” in your Maven artifact list, just Reindex Local Repository available in the Maven Preferences dialog box.

Define datasource in context.xml (for Tomcat)

Create the META-INF\context.xml as for any other standard webapp project:

<?xml version=“1.0” encoding=“UTF-8”?>

<Context>

<Resource name=“jdbc/NOME auth=“Container” type=“javax.sql.DataSource”

maxActive=“100” maxIdle=“30” maxWait=“10000”

username=“username” password=“password”

driverClassName=“com.microsoft.sqlserver.jdbc.SQLServerDriver”

url=“jdbc:sqlserver://host:1433;databaseName=name/>

</Context>

Note: remember to add database drivers into the classpath or add Maven dependency.

Invoke the Kettle transformation from a JSP

You can test the success of this procedure using a JSP with this scriptlet:

<%@ page language=“java” contentType=“text/html; charset=ISO-8859-1”

pageEncoding=“ISO-8859-1”%>

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”

http://www.w3.org/TR/html4/loose.dtd”&gt;

<%@page import=“org.pentaho.di.core.KettleEnvironment”%>

<%@page import=“org.pentaho.di.core.util.EnvUtil”%>

<%@page import=“org.pentaho.di.trans.TransMeta”%>

<%@page import=“org.pentaho.di.trans.Trans”%>

<%@page import=“org.pentaho.di.core.Result”%>

<%@page import=“java.util.List”%>

<%@page import=“org.pentaho.di.core.RowMetaAndData”%>

<%@page import=“org.pentaho.di.core.exception.KettleException”%>

<html>

<head>

<title>PDI</title>

</head>

<body>

<%

try {

KettleEnvironment.init(false);

EnvUtil.environmentInit();

TransMeta transMeta = new TransMeta(“C:\\ PentahoTestIntegration\\test.ktr”);

Trans trans = new Trans(transMeta);

trans.execute(null); // You can pass arguments instead of null.

trans.waitUntilFinished();

Result r = trans.getResult();

List<RowMetaAndData> rowsResult = r.getRows();

if (trans.getErrors() > 0) {

throw new RuntimeException(“There were errors during transformation execution.”);

}

} catch (KettleException e) {

System.out.println(e);

}

%>

</body>

</html>

If you face some ClassNotFoundException, you can modify the Kettle dependencies POM according to the missing JAR. If this is not available into the remote Maven repository, you can use the INSTALL Maven command to add it to your local repository.

If the Transformation succeeded, you can move the KettleEnvironment.init(false) and EnvUtil.environmentInit() in a ContextListener to initialize Kettle components once at startup.

Advertisements