Spring Batch is literally a batch framework based on Spring Framework. I usually use it to develop a simple ETL(Extraction, Transaformation and Loading) program.
In this post, I’ll show you how to write a simple ETL program. (This sample is tested on Spring Batch 3.0.10)
Prerequisites
- Database (MySQL or Oracle)
- Spring batch context database
- Spring batch libraries (spring-batch-core, spring framework, spring-jdbc)
- Database JDBC driver
- DBCP (Optional)
Spring batch context database
Spring batch context database must be created to run Spring batch job. Table creation DDL can be found on spring-batch-core-version.jar (org.springframework.batch.core package contains DDL for several databases)
It contains the following tables.
- BATCH_JOB_INSTANCE
- BATCH_JOB_EXECUTION
- BATCH_JOB_EXECUTION_PARAMS
- BATCH_STEP_EXECUTION
- BATCH_STEP_EXECUTION_CONTEXT
- BATCH_JOB_EXECUTION_CONTEXT
Test scenario
In this post, I use two tables : TB_SOURCE, TB_TARGET. The sample program reads from TB_SOURCE and writes all data into TB_TARGET.
CREATE TABLE TB_SOURCE( | |
col1 VARCHAR2(10), | |
col2 VARCHAR2(20), | |
col3 VARCHAR2(20) | |
); | |
CREATE TABLE TB_TARGET( | |
new_col1 VARCHAR2(10), | |
new_col2 VARCHAR2(20), | |
new_col3 VARCHAR2(20) | |
); |
Writing base Spring context
To run a Spring batch program, some beans need to be declared.
- Spring Batch Context DataSource
- DataSource for source and target
- TransactionManager for Spring Batch Context
- TransactionManager for source and target
- Job Repository bean
- Job Launcher (which is the starting point of a job)
The following is a snippet of context.xml.
<bean id="repositoryDataSource" | |
class="org.apache.commons.dbcp.BasicDataSource" | |
destroy-method="close"> | |
<property name="driverClassName" value="oracle.jdbc.OracleDriver" /> | |
<property name="url" value="jdbc:oracle:thin:@guest01:1521:myora" /> | |
<property name="username" value="scott" /> | |
<property name="password" value="tiger" /> | |
<property name="initialSize" value="2" /> | |
<property name="minIdle" value="2" /> | |
<property name="maxActive" value="10" /> | |
<property name="maxIdle" value="5" /> | |
</bean> | |
<bean id="testDataSource" | |
class="org.apache.commons.dbcp.BasicDataSource" | |
destroy-method="close"> | |
<property name="driverClassName" value="oracle.jdbc.OracleDriver" /> | |
<property name="url" value="jdbc:oracle:thin:@guest01:1521:myora" /> | |
<property name="username" value="scott" /> | |
<property name="password" value="tiger" /> | |
<property name="initialSize" value="2" /> | |
<property name="minIdle" value="2" /> | |
<property name="maxActive" value="10" /> | |
<property name="maxIdle" value="5" /> | |
</bean> | |
<bean id="repositoryTransactionManager" lazy-init="true" | |
class="org.springframework.jdbc.datasource.DataSourceTransactionManager"> | |
<property name="dataSource" ref="repositoryDataSource" /> | |
</bean> | |
<bean id="testTransactionManager" lazy-init="true" | |
class="org.springframework.jdbc.datasource.DataSourceTransactionManager"> | |
<property name="dataSource" ref="testDataSource" /> | |
</bean> | |
<batch:job-repository id="jobRepository" | |
data-source="repositoryDataSource" | |
transaction-manager="repositoryTransactionManager" | |
isolation-level-for-create="READ_COMMITTED" | |
table-prefix="BATCH_" | |
/> | |
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher"> | |
<property name="jobRepository" ref="jobRepository" /> | |
</bean> |
Writing Job flow
Now, it’s ready to write a ETL program. A job is also declared in spring context.xml. The following is the basic structure of a Job.
- A job can have several Steps (for simplicity, I use one Step for this sample)
- Each Step has a Reader, a Processor and a Writer
- Reader reads data from database, file or some data store
- Reader invokes RowMapper to convert raw data into Source VO
- Processor reads a Source VO and convert it into Target VO
- Writer writes Target VO into database, file or some data store
The following is a sample ETL job definition.
<batch:job id="TestJob01" job-repository="jobRepository"> | |
<batch:step id="SimpleStep"> | |
<batch:tasklet transaction-manager="testTransactionManager" allow-start-if-complete="true"> | |
<batch:chunk | |
reader="TestReader" | |
processor="TestProcessor" | |
writer="TestWriter" | |
commit-interval="1" | |
reader-transactional-queue="false" | |
/> | |
</batch:tasklet> | |
</batch:step> | |
</batch:job> | |
<bean id="SourceMapper" | |
class="test.rowmapper.TbSourceMapper"/> | |
<bean id="TestReader" | |
class="org.springframework.batch.item.database.JdbcCursorItemReader" | |
scope="step"> | |
<property name="dataSource" ref="testDataSource"/> | |
<property name="sql" | |
value="SELECT COL1, COL2, COL3 | |
FROM TB_SOURCE"/> | |
<property name="rowMapper" | |
ref="SourceMapper"/> | |
<property name="fetchSize" value="100" /> | |
<property name="maxRows" value="0" /> | |
</bean> | |
<bean id="TestProcessor" | |
class="test.processor.TestProcessor" | |
scope="step"> | |
</bean> | |
<bean id="TestWriter" | |
class="org.springframework.batch.item.database.JdbcBatchItemWriter" | |
scope="step"> | |
<property name="assertUpdates" value="true" /> | |
<property name="itemSqlParameterSourceProvider"> | |
<bean class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider" /> | |
</property> | |
<property name="sql" | |
value="INSERT INTO TB_TARGET(NEW_COL1, NEW_COL2, NEW_COL3) | |
VALUES(:newCol1, :newCol2, :newCol3)" | |
/> | |
<property name="dataSource" ref="testDataSource" /> | |
</bean> |
As the above sample shows, if a Reader or a Writer’s target is database, SQL can be used directly. (It’s the strength of Spring batch)
Writing RowMapper
RowMapper is a component which converts raw source data into Source VO. It must implement org.springframework.jdbc.core.RowMapper. The sample RowMapper is as follows.
import java.sql.ResultSet; | |
import java.sql.SQLException; | |
import org.springframework.jdbc.core.RowMapper; | |
import test.vo.SourceVO; | |
public class TbSourceMapper implements RowMapper<SourceVO> { | |
@Override | |
public SourceVO mapRow(ResultSet rs, int rowNum) throws SQLException { | |
SourceVO vo = new SourceVO(); | |
vo.col1 = rs.getString(1); | |
vo.col2 = rs.getString(2); | |
vo.col3 = rs.getString(3); | |
return vo; | |
} | |
} |
Writing Processor
Processor is the core of Spring Batch. It can transform source data, verify it or execute any additional logic. In this sample, it simply maps source to another column name. The sample Processor is as follows.
import org.springframework.batch.item.ItemProcessor; | |
import test.vo.SourceVO; | |
import test.vo.TargetVO; | |
public class TestProcessor implements ItemProcessor<SourceVO, TargetVO> { | |
@Override | |
public TargetVO process(SourceVO item) throws Exception { | |
TargetVO targetVO = new TargetVO(); | |
targetVO.new_col1 = item.col1; | |
targetVO.new_col2 = item.col2; | |
targetVO.new_col3 = item.col3; | |
return targetVO; | |
} | |
} |
Writing Source VO
public class SourceVO { | |
public String col1; | |
public String col2; | |
public String col3; | |
} |
Writing Target VO
Target VO must have getter method in this sample.
public class TargetVO { | |
public String new_col1; | |
public String new_col2; | |
public String new_col3; | |
public String getNewCol1() { | |
return this.new_col1; | |
} | |
public String getNewCol2() { | |
return this.new_col2; | |
} | |
public String getNewCol3() { | |
return this.new_col3; | |
} | |
} |
And there is no more components to write except a program which invokes this sample Job.
You can download the full sources from github.