Spring Batch Tutorial
Spring batch is a lightweight framework that provides a solid foundation on which to build robust and scalable batch applications. It provides developers with a set of tried and tested patterns that solve common batch problems and allows developers to focus more on the business requirement and less on complex batch infrastructure. Spring batch contains a variety of out of the box configurable components that can be used to satisfy many of the most common batch use cases. Extensive XML configuration and an extensible programming model mean that these components can be customised and used as building blocks to quickly deliver common batch functionality.
This tutorial will show you how to build a very simple batch application to read fixed length data from a flat file and write it to a database table. This is a common batch use case and should be sufficient to demonstrate some of the fundamental concepts of Spring batch and provide you with a foundation on which to build more complex batch applications.
Sample Application
The sample batch application described in this tutorial uses a H2 in memory database so that you can download the sample code and run it without having to set up a database server. The sample job is run as an integration test so once you grab the code you can have a working batch job up and running in a matter of minutes. The rest of this post will take you through a step by step guide describing all components in the sample batch job provided.
Project Structure
The diagram below shows the project structure of our sample batch application. Each component is described in detail below.
Batch Job Definition
This import-accounts-job-context file contains the XML definition of our batch job and the components it uses. Each part of the job definition is described in detail below.
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:batch="http://www.springframework.org/schema/batch" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.1.xsd"> <job id="importAccountData" xmlns="http://www.springframework.org/schema/batch"> <step id="parseAndLoadAccountData"> <tasklet> <chunk reader="reader" writer="writer" commit-interval="3" skip-limit="2"> <skippable-exception-classes> <include class="org.springframework.batch.item.file.FlatFileParseException" /> </skippable-exception-classes> </chunk> </tasklet> </step> </job> <bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"> <property name="resource" value="file:#{jobParameters['inputResource']}" /> <property name="linesToSkip" value="1" /> <property name="lineMapper"> <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper"> <property name="lineTokenizer"> <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"> <property name="names" value="ACCOUNT_ID,ACCOUNT_HOLDER_NAME,ACCOUNT_CURRENCY,BALANCE" /> <property name="delimiter" value="," /> </bean> </property> <property name="fieldSetMapper"> <bean class="com.blog.samples.batch.AccountFieldSetMapper" /> </property> </bean> </property> </bean> <bean id="writer" class="com.blog.samples.batch.AccountItemWriter"> <constructor-arg ref="dataSource" /> </bean> </beans>
Line 11 – Batch job element defines a batch job which is the top level configurable batch component and acts as a container for one or more batch steps. The id attribute is used to uniquely identify the batch job and is used later by the JobLauncher to invoke the job.
Line 12 – A batch step is a component that represents a specific independent phase of a batch job. In our sample application we define a single step that parses data from a flat file and loads that data into the database. Our step is given the unique identifier parseAndLoadAccountData.
Line 13 – Spring batch provides Tasklets as an extension point that allow developers to handle processing inside a batch step. A tasklet is a Java class that implements the tasklet interface and is written to implement custom logic within a step. The tasklet is then invoked by Spring Batch at runtime.
Line 14 – For read/write uses cases Spring batch uses chunk oriented processing. Items are read one at a time by an item reader and are aggregated into a collection, or ‘chunk’ of a specified size. When the number of read items in the chunk reaches the specified limit, the contents of the chunk are sent to the item writer and written to the target data source. The size of the chunk is configured as a commit limit on the chunk definition. The diagram below describes the sequence of events and components used for chunk processing.
A chunk is configured by specifying the following
- Item Reader – component that reads data from a specified data source. Common data sources include flat files, XML files, database tables, JMS etc.
- Item Writer – component that writes data to a target data source in chunks. Common data sources are the same as described in the item reader above.
- Commit Interval – Value specifies chunk size for the batch step, in other words, the number of items that are aggregated and written by the item writer in a single commit.
- Skip Limit – Number of erroneous records that can be skipped before a batch job fails. In our sample application we set the skip limit to 2, meaning that if 2 erroneous records are encountered the batch process will continue. If a third erroneous record is found the batch job will terminate.
Line 15 to 17 – The Skippable Exception Class element like the skip limit attribute above, provides a means of ensuring the batch application is more robust. You can define a list of exceptions, that if encountered during processing, will be ignored by Spring Batch. In our sample application we have chosen to ignore FlatFileParseExeptions.
Line 23 – The FlatFileItemReader is one of the reader components that Spring Batch provides out of the box. Reading flat files is a common batch use case so Spring Batch provides a convenience class that be easily configured to satisfy this requirement. I’ve described this configuration in detail below.
Line 24 – The resource attribute refers to the input file to be processed. In this instance we set the input file as a job parameter using the following notation #{jobParameters[‘inputResource’]}. In order to set component attributes as job parameters, the class must support late binding by setting the scope attribute to step (line 23).
Line 25 – The linesToSkip attribute indicates the number of lines that should be ignored by the reader before actual processing begins. In our example we’ve ignored the first line of the file as this is a header row.
Line 26 – The lineMapper attribute defines the configuration of the component that will perform the line reads. In this instance we use a Spring Batch implementation called DefaultLineMapper which requires a line tokenizer component to split the line contents into individual fields.
Line 29 – Spring Batch provides an out of the box implementation of the tokenizer called DelimitedLineTokenizer that is configured with a list of field names.
Line 30 – The DelimitedLineTokenizer splits the line into tokens that are later referenced by the token names defined.
Line 31 – The delimiter attribute specifies the delimiter used to tokenize each line of the input file. In this instance our input file is comma delimited.
Line 34 & 35 – The fieldSetMapper attribute refers to the custom class AccountFieldSetMapper that takes a Field Set and maps the fields to instance variables on the Account POJO (descried later).
Line 41 – The writer component is responsible for writing data items, in this case Account POJOs to the database. When the reader has reached the commit limit it passes a chunk of read items to the writer component so that they can be written to the database in a single transaction.
File Set Mapper
package com.blog.samples.batch; import org.springframework.batch.item.file.mapping.FieldSetMapper; import org.springframework.batch.item.file.transform.FieldSet; import org.springframework.validation.BindException; import com.blog.samples.batch.model.Account; /** * Account field mapper takes FieldSet object for each row in input * file and maps it to an Account model object * */ public class AccountFieldSetMapper implements FieldSetMapper<Account> { /** * Map provided fieldset to Account POJO using keys defined in the names * attribute of the DelimitedLineTokenizer object */ public Account mapFieldSet(FieldSet fieldSet_p) throws BindException { Account account = new Account(); account.setId(fieldSet_p.readString("ACCOUNT_ID")); account.setAccountHolderName(fieldSet_p.readString("ACCOUNT_HOLDER_NAME")); account.setAccountCurrency(fieldSet_p.readString("ACCOUNT_CURRENCY")); account.setBalance(fieldSet_p.readBigDecimal("BALANCE")); return account; } }
As you can see this class implements the FieldSetMapper interface and provides an implementation of the mapFieldSet method that maps fields from the FieldSet to our Account model object. Individual fields are referenced using the keys defined in the names property of the DelimitedLineTokenizer we defined earlier.
Item Writer
On line 41 of the job definition above we referenced an AccountItemWriter object for writing data to the database. The AccountItemWriter is defined below.
package com.blog.samples.batch; import java.util.List; import javax.sql.DataSource; import org.springframework.batch.item.ItemWriter; import org.springframework.jdbc.core.JdbcTemplate; import com.blog.samples.batch.model.Account; /** * Class takes Account model objects created in item reader and makes * them available to writer to persist in the database * */ public class AccountItemWriter implements ItemWriter<Account> { private static final String INSERT_ACCOUNT = "insert into account (id,accountHolderName,accountCurrency,balance) values(?,?,?,?)"; private static final String UPDATE_ACCOUNT = "update account set accountHolderName=?, accountCurrency=?, balance=? where id = ?"; private JdbcTemplate jdbcTemplate; /** * Method takes a list of Account model objects and uses JDBC template to either insert or * update them in the database */ public void write(List<? extends Account> accounts_p) throws Exception { for (Account account : accounts_p) { int updated = jdbcTemplate.update(UPDATE_ACCOUNT, account.getAccountHolderName(), account.getAccountCurrency(), account.getBalance(), account.getId()); if (updated == 0) { jdbcTemplate.update(INSERT_ACCOUNT, account.getId(), account.getAccountHolderName(), account.getAccountCurrency(), account.getBalance()); } } } public AccountItemWriter(DataSource dataSource_p) { this.jdbcTemplate = new JdbcTemplate(dataSource_p); } }
The AccountItemWriter class implements the ItemWriter interface and provides an implementation of the write method. The write method is invoked by Spring Batch with a list of objects read by the item reader component. The number of items in the list, or chunk size is dictated by the commit-interval attribute on the item reader definition. As you can see above we use a jdbcTemplate to persist the list of account objects one at a time. Note that Spring batch will perform a single commit once all items in the chunk have been written as this is substantially more performant than one commit per object. This is particularly significant when writing large datasets.
Framework Component Configuration
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:jdbc="http://www.springframework.org/schema/jdbc" xsi:schemaLocation="http://www.springframework.org/schema/jdbc http://www.springframework.org/schema/jdbc/spring-jdbc-3.0.xsd http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd"> <jdbc:embedded-database id="dataSource" type="H2"> <jdbc:script location="/create-account-table.sql"/> </jdbc:embedded-database> <bean class="org.springframework.jdbc.core.JdbcTemplate"> <constructor-arg ref="dataSource" /> </bean> <bean id="transactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager"> <property name="dataSource" ref="dataSource" /> </bean> <bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean"> <property name="transactionManager" ref="transactionManager" /> </bean> <bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher"> <property name="jobRepository" ref="jobRepository" /> </bean> </beans>
Lines 9 to 11 – We define an in memory database for persisting the job meta data and Account ‘business data’. In a real world application we’d use a proper RDBMS like MySQL or Oracle.
Lines 13 to 15 – A JdbcTemplate is required by the item writer component to write Account data to the database. We also use the JdbcTemplate in our unit test to check the job ran as expected.
Lines 17 to 19 – A transaction manager is defined and takes in the data source we defined above.
Lines 25 to 27 – A JobLauncher is required so that we can invoke our batch job from the integration test.
Batch Test Data
We have 2 input files to run as part of our integration test. The accounts.txt file below contains 10 valid records.
ACCOUNT_ID,ACCOUNT_HOLDER_NAME,ACCOUNT_CURRENCY,BALANCE 1234567,Riain McAtamney,STG,3233.43 5494032,Gary Jonston,STG,32329.45 4324324,Colm Toale,STG,5435.80 2436513,Gary Gallagher,STG,43234.54 6242345,Connor Smith,EUR,5342.32 5435432,Ruairi Digby,EUR,4322.13 6543523,Steve Jones,EUR,5643.54 5431245,Peter Murray,STG,4324.13 6546556,John Collins,STG,54354.43 7654654,Sean Molloy,STG,32133.22
The accountsError.txt file below contains 8 valid and 2 invalid records and will allow us to test the skip-limit attribute on the item reader.
ACCOUNT_ID,ACCOUNT_HOLDER_NAME,ACCOUNT_CURRENCY,BALANCE 1234567,Riain McAtamney,STG,3233.43 5494032,Gary Jonston,STG,32329.45 4324324,Colm Toale,STG,5435.80 2436513,Gary Gallagher,STG,43234.54 6243345,Connor Smith,EUR,5xxx342.32 5435432,Ruairi Digby,EUR,4322.13 6543523,Steve Jones,EUR,5643.54 5431245,Peter Murray,STG,432XX4.13 6546556,John Collins,STG,54354.43 7654654,Sean Molloy,STG,32133.22
Batch Integration Test
The final step is to write an integration test to run our batch job. The test is defined as follows.
package com.blog.samples.batch.test; import org.junit.Assert; import org.junit.Before; import org.junit.Test; import org.junit.runner.RunWith; import org.springframework.batch.core.Job; import org.springframework.batch.core.JobParametersBuilder; import org.springframework.batch.core.launch.JobLauncher; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.beans.factory.annotation.Value; import org.springframework.core.io.Resource; import org.springframework.jdbc.core.JdbcTemplate; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations = { "/import-accounts-job-context.xml", "/test-context.xml" }) public class ImportAccountsIntegrationTest { @Autowired private JobLauncher jobLauncher_i; @Autowired private Job job_i; @Autowired private JdbcTemplate jdbcTemplate_i; @Value("file:src/test/resources/input/accounts.txt") private Resource accountsResource; @Value("file:src/test/resources/input/accountsError.txt") private Resource accountsErrorResource; @Before public void setUp() throws Exception { jdbcTemplate_i.update("delete from account"); } @Test public void importAccountDataTest() throws Exception { int startingCount = jdbcTemplate_i.queryForInt("select count(*) from account"); jobLauncher_i.run(job_i, new JobParametersBuilder().addString("inputResource", accountsResource.getFile().getAbsolutePath()) .addLong("timestamp", System.currentTimeMillis()) .toJobParameters()); int accountsAdded = 10; Assert.assertEquals(startingCount + accountsAdded, jdbcTemplate_i.queryForInt("select count(*) from account")); } @Test public void importAccountDataErrorTest() throws Exception { int startingCount = jdbcTemplate_i.queryForInt("select count(*) from account"); jobLauncher_i.run(job_i, new JobParametersBuilder().addString("inputResource", accountsErrorResource.getFile().getAbsolutePath()) .addLong("timestamp", System.currentTimeMillis()) .toJobParameters()); int accountsAdded = 8; Assert.assertEquals(startingCount + accountsAdded, jdbcTemplate_i.queryForInt("select count(*) from account")); } }
Line 18 – Import the job and infrastructure component definitions required to run the job.
Line 22 to 31 – Injected infrastructure dependencies and file resources required to run the batch job.
Line 33 to 37 – Set-up method runs before test and clears down database.
Line 43 – Here we use the JobLauncher to run our import account data job by loading the accounts.txt file. A JobParametersBuilder object is used to pass in the input file and time stamp as job parameters.
Line 48 – Query the database to get the Account table row count and ensure all rows have been successfully persisted.
Line 55 – Use the JobLauncher to run our import account data job by loading the accountsError.txt file. A JobParametersBuilder object is used to pass in the input file and time stamp as job parameters.
Line 60 – Query the database to get the Account table row count and ensure that only 8 rows have been successfully persisted, as we would expect the two invalid rows were skipped.
Sample Code
You can get the sample code for this post on github at https://github.com/briansjavablog/spring-batch-tutorial. Feel free to experiment with the code and as usual comments/questions are welcome.
Leave A Comment