CHAPTER 9

image

Spring Batch

So far our focus has been tailored toward enterprise technologies, which process requests or messages from upstream applications. Such abstractions are crucial for handling real-time processing of a small amount of data, mostly representing one action against our enterprise system. But real-world enterprises also need to perform background tasks, whereby an often large amount of data needs to be processed, migrated, copied, or converted from one system to another on a regular basis.

Such processing is often referred to as batch processing, because data is processed in large amounts that usually can’t fit into memory or a single transaction. This amount of data needs to be divided and processed in manageable chunks. This processing is often executed on a regular basis (for example, daily, weekly, or monthly) and may be long running. Batch processing typically doesn’t involve user interaction.

Use cases for batch processing include the following:

  • Sending subscription e-mails
  • Sending monthly invoices
  • Synchronizing a data warehouse
  • Performing business reporting
  • Processing orders

The first chapter covered scheduling support, which can be used for scheduling background tasks. But it isn’t handy in real life, because a lot of problems can occur in background processing. Low-level scheduling support based on a fixed rate, fixed delay, or CRON expressions simply isn’t advanced enough to cover handling errors, restarting, distributing work to other machines, chunking of data processing, or controlling the flow of batch actions without a lot of custom code.

Fortunately, the Spring portfolio contains the Spring Batch (SB) project, which fills the gap of advanced abstractions for enterprise batch-processing needs. SB was a unique project within the Java platform until Java Enterprise Edition version 7 (JEE 7) was introduced. This major revision of Java Enterprise standards introduced APIs for batch processing that were significantly influenced by Spring Batch constructs. Michael Minella (SB project lead) was heavily involved in the expert group defining JSR-352, which covers batch processing for the Java platform (https://jcp.org/en/jsr/detail?id=352). SB also remains the flagship implementation project of this Java standard.

Spring Batch Domain

The SB project philosophy tries to achieve the Spring family’s overall goal of minimizing necessary boilerplate code, so that developers can concentrate on business logic. Therefore, it introduces various abstractions that are unique to the batch-processing domain, shown in Figure 9-1.

9781484207949_Fig09-01.jpg

Figure 9-1. Spring Batch domain

The first and major abstraction is Job. It represents a unit of work that needs to be processed. Job is constructed of one or more steps. Step represents this partial processing of work performed by Job. Every time the Job is executed, the SB framework creates a new JobInstance, so one Job can have one or more JobInstances.

One job instance can be executed various times (for example, when JobInstance needs to be restarted), so it can have various JobExecutions. Because each Job needs to have at least one Step, it also needs to create a StepExecution for each JobExecution. One StepExecution can be executed various times within one JobExecution (for example, when we want to restart or retry Step), so there is a one-to-many relationship between them.

Developers need to define Job and Steps for batch processing. All the other objects are created by the SB framework. SB takes care of execution when Job is triggered and handles errors that can occur during execution according to the Job and Step definitions. This way, the developer can granulate work into various Steps and logically separate batch processing into smaller pieces and leave the hard work of coordinating execution for SB. We will dive into these definitions later in this chapter.

Chunk-Oriented Processing

The most common use case for the SB framework is processing large amounts of data in chunks. This is called chunk-oriented processing, and it happens in one Step. Step is composed of one mandatory ItemReader<T>, one optional ItemProcessor<T, S>, and one mandatory ItemWriter<S>. These Step parts are Java interfaces, which can be implemented by our custom logic. Notice that in the following text, we don’t use generic types when these interfaces are discussed.

SB also provides some commonly used implementations out of the box for reading/writing from/to the following:

  • JDBC/Hibernate/stored procedure
    • Cursor-based item readers—the cursor is a DB construct in which rows are streamed from a database
    • Paging-based item readers—uses a distinct where clause for each chunk of data (page)
    • Item writers
  • Flat files
  • XML
  • JMS

Figure 9-2 shows a sequence diagram of this mechanism.

9781484207949_Fig09-02.jpg

Figure 9-2. Chunk-oriented processing sequence diagram

When Step is executed, SB starts the loop of chunk processing. The Step definition has to have a chunk size defined; one chunk consists of various ItemReader and ItemProcessor calls to read and process one item at a time until the chunk size is reached. When the chunk is read and processed, Step calls ItemWriter to write the whole chunk and then continues with the next chunk. The looping ends when ItemReader returns null, which means that there’s no more data to process and this partial chunk is the last one in the current StepExecution.

Common Classes for Chunk-Oriented Processing Examples

Before we dive into chunk-oriented examples, we’ll introduce a few classes that are shared across those examples in this chapter. As in the previous chapter, we won’t highlight SB features in real-life examples, because SB is also a highly configurable framework. The simplicity of the examples will better illustrate SB features. Listing 9-1 shows a simple repository for reading records of the String type.

This Spring bean is annotated with @Repository. When this bean is initiated by Spring, it generates simple records by using the Java 8 Stream API. The IntStream.range method generates a stream of 15 integers, which are converted into String records and collected into List. When this collection of records is generated, we store its iterator into the constant ITERATOR, which is used for reading records via the readNext method. This method is synchronized, because reading can occur in various threads. This is handy in some examples. Listing 9-2 shows the custom ItemReader.

This ItemReader implementation is also a Spring bean, which injects the ReadRepository bean to read from. The generic String defines the type of item we are reading. The read method needs to be implemented to read one item for processing. Listing 9-3 shows the custom ItemProcessor used in chunk-oriented examples.

This is also a Spring bean, which appends the string processed to the processing record. In the real world, this would be the place where we convert items from read format/POJO into write format/POJO. Generic types of the ItemProcessor interface define the type of item before and after processing. In this simplistic example, both are of type String. Listing 9-4 shows the simple write repository.

This Spring bean logs a given collection of records and pretends some type of processing. For logging, we use the Lombok annotation @Slf4j. For looping through records, we again use the Java 8 Stream API. Listing 9-5 shows the custom ItemWriter.

This Spring bean implements the ItemWriter interface; the item type is String. WriteRepository is injected, so we can write records via the write() method.

Image Note  You might ask why we have ReadRepository and WriteRepository in place at all when we want to keep the example as simple as possible. The reason is testability. I want to make sure that the examples are working as described, so the SB configuration is covered by simple integrations tests. ReadRepository and WriteRepository are often faked in these tests and used for verification. But this testing is beyond the scope of this book. Tests alongside the book examples can be found in the GitHub repository (https://github.com/lkrnac/book-eiws-code-samples).

Chunk-Oriented Processing Example with XML Configuration

The chunk-oriented processing examples use Spring Boot, but you’ll see that configuring SB applications with the plain Spring Framework is similar. Our first SB example, shown in Listing 9-6, uses XML configuration.

We don’t use the beans namespace by default, because using the batch namespace (we don’t need to use the <batch: prefix for batch XML tags) makes the SB configurations much more readable. We use this approach for SB configurations in this chapter.

The configuration itself contains the definition of one Job with one Step, which uses our common reader, processor, and writer. The <job> and <step> XML tags create Spring beans with names defined by the id attribute. The last attribute in the Step definition is commit-interval, which specifies the chunk size for this step. Listing 9-7 shows how the XML configuration is loaded into the Spring context.

This is a standard Spring configuration class, importing the XML configuration from Listing 9-6. The last annotation is @EnableBatchProcessing, which enables SB features and creates default beans needed for batch processing. We will dive into these beans later in the chapter. For now, we just need to know it’s necessary for SB configuration. Notice that this configuration class is often used in examples involving XML configuration of SB, so we will be referring to this listing often.

Listing 9-8 presents the last part of this example.

This is the standard Spring Boot main application class. The @SpringBootApplication annotation executes Spring Boot’s autoconfiguration and component scan in the current package and subpackages. This configuration is then executed in the main method via a call to SpringApplication.run().

This main Spring Boot class is used often in examples based on Spring Boot, so we will be referring to this listing later. It is important to remember that if the Spring Boot application uses @EnableBatchProcessing, all the jobs are executed by default at the application’s start.

Running this main configuration class results in the output in Listing 9-9.

Chunk-Oriented Processing Example with Java Configuration

Now let’s configure the same behavior with a Java configuration. Listing 9-10 shows the Java batch configuration.

The @Configuration annotation is  typically used for Spring Java configuration, and @EnableBatchProcessing was discussed in the previous example. The Step bean is created via the simpleRecordsStep() method, which is annotated by @Bean. Spring injects instances of the reader, writer, and processor alongside the instance of StepBuilderFactory. As its name suggests, it is used for building steps. Its creation was initiated by the @EnableBatchProcessing annotation.

When we have all the necessary beans injected, we can create the step. SB provides fluent APIs for defining batch flows. The stepBuilderFactory.get() method takes the name of the Step as a parameter, but notice that the Spring bean name can be different from the Step name. So we can’t confuse the Step name with the Spring bean name of the Step instance.

The next call in the chain is the specification of the chunk size via the chunk() method. In this call, we need to define generic types of the items handled by the writer, reader, and processor. The Step creation chain then continues—defining the reader, writer, and processor in this step—and is finalized by the build() call, which applies the recorded configuration and creates the Step instance.

The second @Bean definition creates the Job instance. We need an instance of the Step and JobBuilderFactory, which is for Job instance creation. Similar to the Step creation chain, the Job creation chain needs to call JobBuilderFactory.get() to name the Job. Again, notice that the Job name can be different from the name of the bean specified by method. This Job will start only the Step we created, so we need to call start to define the Step method and build the job. In this case, we use only one step. Jobs with various steps are shown later in the chapter.

The main class of this application is the standard Spring Boot main class (shown previously in Listing 9-8). When we run it, we can see output similar to that in Listing 9-9.

Example with File Reader and JDBC Writer

First, we define the model class used for this example in Listing 9-11.

We’ve already seen this class in previous chapters. This POJO has two parameters, email and name, and uses Lombok’s annotation @Data to generate getters and setters. Listing 9-12 shows the database initialization bean.

Spring Boot initializes the JdbcTemplate and DataSource by default if autoconfiguration is turned on and the relevant database libraries are on the classpath. In this case, we have an H2 in-memory database configured on the classpath. This Spring bean injects the JdbcTemplate instance to initiate the schema of the in-memory database. The initDbTable() method is annotated by @PostConstruct, so it will be executed right after the Spring context is initialized.

Next, Listing 9-13 shows the input file we are using as a data source for reading.

The delimiter for this file is a comma. Listing 9-14 shows the configuration of the file reader and JDBC writer.

This Spring configuration class defines two beans. The first one is used as Spring’s implementation of ItemReader for reading from flat files: FlatFileItemReader<User>. First, we need to configure the source file location. In this case, we expect it on the classpath with the name users.txt (the file in Listing 9-13). Next we need to configure the target type for the conversion from one line into BeanWrapperFieldSetMapper<User>. In our case, we use the User class. DelimitedLineTokenizer is used for parsing the file based on the comma delimiter and maps the parsed values into the User object fields. Finally, we configure the created line tokenizer and field set mapper into the flatFileItemReader object, and the item reader is configured.

The second bean creates Spring’s item writer implementation JdbcBatchItemWriter<User>, used for writing into databases. The generic type specifies the type of the items to be written. In order to use the User field names as parameters in the JDBC query, we need to configure the BeanPropertyItemSqlParameterSourceProvider<User> implementation as a SQL parameter provider. Apart from that, we need to also configure the JDBC data source instance and the SQL query to execute into the writer.

Image Note  The SB framework provides a lot of possibilities for out-of-the-box implementations of ItemReader and ItemWriter. Full coverage of these APIs and implementations is beyond the scope of this book. Refer to SB reference documentation for further details.

Listing 9-15 shows the batch configuration of Step and Job.

This configuration is similar to the previous example. We create the Step and Job instance based on the file item reader and the JDBC item writer in Listing 9-14. In this case, we don’t use a processor. It is an optional component of chunk-oriented processing. The main class of this example is also a standard Spring Boot main class. When we execute it as a Java application, we see the output in Listing 9-16.

The status completed means that all the records were successfully copied from the file into the in-memory DB.

Tasklet Step

Chunk-oriented processing is not the only type of step we need to cover for enterprise application use cases. Sometimes we need to perform a single action as part of a bigger flow. For example, we might need to send a notification at the end of a job or perform a single stored procedure call. SB provides the Tasklet interface. It has only one method, execute, where we can place our custom logic. This instance can then be wrapped into the TaskletStep class and used as a Step in the SB flow.

Common Classes for Tasklet Examples

Various examples in this chapter use simple Tasklet steps to highlight Spring Batch features that control job execution flow. Listing 9-17 shows this class, which is used often in this chapter.

This simple Spring bean uses Lombok to log a given message into the execute() method.

Image Note  SimpleExecutablePoint is useless for real-world applications, but it is handy for the examples in this chapter, because we can fake this bean and verify that the SB configuration works as expected. Tests alongside the book examples can be found in the GitHub repository (https://github.com/lkrnac/book-eiws-code-samples).

Our domain for the Tasklet type of batch flows used in our examples is the preparing of tea. When we want to prepare tea, we need to perform a few steps to get a tasty result. For example, we might need to Boil Water, Add Tea to cup, and Add Water to cup. Listing 9-18 shows the class simulating the Boil Water step.

This simple Spring component autowires SimpleExecutablePoint. In the execute method, we call simpleExecutableStep.execute() to simulate the Boil Water step. SB injects the StepContribution and ChunkContext instances into this method and expects a return value of type RepeatStatus. Use of these types is explained in later sections. For this example, we just need to know that RepeatStatus.FINISHED indicates to the DB a successful completion of Step.

We won’t list the AddTea and AddWater classes, because they are similar to BoilWater; we’re only simulating the tea preparation step with a different text message.

Tasklet Step Example with XML Configuration

Now that we’ve sketched the domain, we can take a look at the SB configuration in Listing 9-19.

Job in this case consists of three steps; each one refers to the Tasklet instance of BoilWater, AddTea, and AddWater. Each step has to have a unique name and needs to specify which step should go next after its completion. The main class BatchApplication and BatchConfiguration are exactly the same as for XML chunk-oriented examples in Listings 9-7 and 9-8. When we run BatchApplication as a Java application, we see the output in Listing 9-20.

Figure 9-3 depicts the SB job.

9781484207949_Fig09-03.jpg

Figure 9-3. Batch graph of tasklet example batch job

With SB, we can also visualize batching flows in Spring Tools Suite (STS), similarly to Spring Integration. We can also use the STS Batch Graph Editor to create SB flows. Figure 9-4 shows the location of the editor in STS.

9781484207949_Fig09-04.jpg

Figure 9-4. Batch Graph Editor tab in STS

By double-clicking SB components in the graph, we can change attributes of those components via the Properties tab. This is shown in Figure 9-5.

9781484207949_Fig09-05.jpg

Figure 9-5. Editing SB component attributes

Tasklet Step Example with Java Configuration

Preparing tea with a Java configuration is shown in Listing 9-21.

Each custom Tasklet (BoilWater, AddTea, AddWater) needs to be wrapped into TaskletStep so that we can use it as part of the Job. We create a Step instance for each Tasklet by using StepBuilderFactory. The get() method creates an instance of StepBuilder based on the given name of the Step, the tasklet() method creates a TaskletStepBuilder instance based on the given Tasklet, and finally, the build() call creates the Step instance. We register this instance as a Spring bean.

When we create Job in the prepareTeaJob() method, we need to autowire each Step via the bean name instead of the type, because we have three beans of type Step in the Spring context. This is a little bit of annoying boilerplate code when we define an SB configuration with Java. After we inject all the steps and JobBuilderFactory, we create the Job instance and also register it as a Spring bean.

The main class for this example is the standard Spring Boot main class. When we run it, we see output similar to that of the previous example (Listing 9-20).

JobLauncher

JobLauncher is the interface used to execute SB jobs registered in the Spring context. Spring provides a SimpleJobLauncher implementation of this interface out of the box.

So far, we haven’t needed to use it, because every Job was executed immediately after the Spring context was initialized by Spring Boot. We also didn’t need to create the JobLauncher instance, as the @EnableBatchProcessing annotation created it for us. But automatic execution at the application’s start is not always suitable, and we may need to explicitly execute a particular job.

Listing 9-22 shows the signature of the JobLauncher interface.

JobLauncher declares only one method, run(), which returns an instance of JobExecution. When the client executes this method, the execution is by default synchronous; the caller is blocked until the job finishes. In such a case, the resulting status of the job is indicated by JobExecution.getExitStatus(). When we use synchronous execution, the exit status is typically ExitStatus.COMPLETED or ExitStatus.FAILED.

But in some cases, we don’t want to block the caller. For these cases, implementation of SimpleJobLauncher can be configured with TaskExecutor. With this configuration, the SimpleJobLauncher.run() call doesn’t block, and the existing status may configured to ExitStatus.UNKNOWN, ExitStatus.STARTING, or ExitStatus.STARTED, until the Job is finished.

JobLauncher Example with XML Configuration

In this case, we don’t use Spring Boot. Listing 9-23 shows the XML configuration.

We configure the Spring components scan in the net.lkrnac.book.eiws.chapter09 package, which will configure the common Tasklet “tea” steps into our context. Next we need to configure two mandatory beans. For transactionManager, we use ResourcelessTransactionManager. This implementation of a transaction manager doesn’t start a transaction; it is supposed to be used for testing purposes only. In our case, we don’t use any transactions, but we need to configure a transaction manager for the JobRepository bean.

JobRepository is the second mandatory bean for SB configuration. We will dive into its function later in the chapter. We just quickly mention now that it is used to store SB processing metadata. In this case, we aren’t using a real data store, but instead will store metadata into an in-memory map. This implementation of JobRepository is provided by MapJobRepositoryFactoryBean. Next we configure the JobLauncher bean. The SimpleJobLauncher implementation requires configuring the mandatory parameter jobRepository.

This example uses the same SB flow as the TaskletStep XML example in Listing 9-19. The main class of this example is shown in Listing 9-24.

@Slf4j is Lombok’s convenience annotation to define the Log4j logger instance. Notice that this main class is not a Spring configuration in this case. This is because we create the Spring context based on two XML configurations already mentioned in the previous listings.

When the Spring context instance is created, the JobLauncher and Job instances can be retrieved via the getBean() method. The next call is execution of Job itself. JobLauncher.run() also needs to take a parameter of type JobParameters. In this case, we create an empty instance of it, which means our Job doesn’t take any parameters. We show the use of this feature later in the chapter.

The last statements in the main method print out the existing status and close the Spring context. Listing 9-25 shows the output when we run this main class.

JobLauncher Example with Java Configuration

Listing 9-26 shows the Java configuration of the mandatory SB beans.

Readers familiar with Spring Core features shouldn ’t be surprised. This Java configuration is the exact mirror of the XML configuration in the previous example (Listing 9-23). The @Bean annotation and <bean> XML tag can be interchanged one to one in most cases for Spring configurations. The only difference is when FactoryBean is used, because XML configuration calls getObject() automatically, whereas the Java configuration needs to call this method explicitly. In this case, we need to create JobRepository by the factory method MapJobRepositoryFactoryBean.getObject().

The next class of this example defines the SB Job with Steps. We use the same tea preparation configuration as in Listing 9-21 from the TaskletStep example. The main class of this example is shown in Listing 9-27.

This code is similar to the previous example. The only difference is that this class is also a Spring configuration doing a component scan in the current package with subpackages, and we use it to create the Spring context. The output after running this main class is also similar to the previous example in Listing 9-25.

Asynchronous JobLauncher Example

Let’s dive into a case where the caller of the SB job can’t be blocked. Listing 9-28 shows the SB beans configured to execute Job asynchronously.

In this configuration, we introduce the new Spring bean TaskExecutor, which is used for JobLauncher bean creation. This setup ensures that Job is executed asynchronously by JobLauncher. This example also uses the tea preparation Job configuration from Listing 9-19. The main class of this example is in Listing 9-29.

After we execute prepareTeaJob, we log the exit status immediately, after 10 ms and also after 500 ms. As Job is running, we should observe the exit status changing in these log entries. Listing 9-30 shows the output after running this main class.

The exit status immediately after we kick off the Job is STARTING, and after 10 ms it changes to STARTED. Notice that Job execution log entries are done from the customTaskExecutor thread pool. After 500 ms, the job also is COMPLETED in the caller thread.

JobParameters

There is often a requirement to pass parameters into a batch Job. Imagine we need to process data with an attribute-created date. We may want to specify a date parameter for batch processing, so that only data created on the specified date will be processed.

SB provides this support via the JobParameters class, which can be passed into the JobLauncher.run() method. This class encapsulates parameters into Map<String, JobParameter>. So each parameter has its name (key), and the value can be any Java type. Notice that the value in the map is the type JobParameter (singular), which represents a single batch Job parameter. JobParameters (plural) represents all parameters for one JobInstance.

In the JobParameters example, we slightly change the tea preparation process. Let’s add sugar into our cup of tea. Listing 9-31 shows Tasklet using JobParameter.

In this tea preparation Tasklet, we use the chunkContext instance to access the JobParameters instance and retrieve the parameter with the name sugarAmount. This specifies how much sugar the caller wants to add to a cup of tea. Again, we simulate the action by executing the simpleExecutableStep instance. After we are finished, we mark Step as FINISHED. Listing 9-32 shows the batch configuration.

In this batch flow, we use AddTeaWithParameter instead of AddTea. Otherwise, the batch flow is exactly the same as in previous examples. Listing 9-33 shows the main class of this example.

When we create the Spring context instance and retrieve JobLauncher and Job, we execute the prepareTeaJob job various times. The createJobParameters method helps us create the JobParameters instance with the parameter called sugarAmount. The first execution is done without sugar, and the second cup of tea is prepared with two spoonfuls of sugar. Each execution status is printed, and the Spring context is closed at the end of the program. Listing 9-34 shows the Spring Boot configuration file.

By default, Spring Boot with autoconfiguration enables executing all the SB Jobs it finds. But in this case, we don’t want to run a default job without parameters, because we want to execute them explicitly with parameters. Therefore, this configuration is needed to disable automatic execution of SB jobs by Spring Boot autoconfiguration. Listing 9-35 shows the output when we run this example.

CommandLineJobRunner

A lot of enterprises use some kind of scheduling system and monitoring system for batch processing. These systems often require running the execute batch processing logic from the command line. Therefore, SB provides easy execution of batch jobs via the command-line interface (CLI). This support is allowed via the CommandLineJobRunner class, where we can explicitly specify which Job should be executed as well as which parameters should be passed into it.

Of course, the SB application needs to indicate the exit status of the job execution when it is run from the CLI. This support uses a standard numeric exit code mechanism for processes within the operating system. When the operating system process returns a 0 exit code, it indicates success. If this exit code is nonzero, an error occurred. Therefore, SB assigns to each ExitStatus a numeric value, where COMPLETED has value 0. Every other ExitStatus is erroneous from the operating system point of view. FAILED status, for example, has value 5.

Execute Thin JAR from the Command Line

A Java application can be packaged in various ways. For example, Maven will build a thin JAR by default. This means that our application will be packaged into a single JAR without all the dependencies this JAR uses. This JAR packaging expects to have all the dependency locations provided in the CLASSPATH system variable.

This application structure is covered by the 0907-job-launcher-javaconfig example we already explained. This project needs to be first built with Maven. In the root directory of this project is the Maven configuration file pom.xml. If we have Maven installed, we can run the command mvn clean install, and the project will be built packaged as a thin JAR. After the project is packaged, we can run the SB Job from the command line by using the command in Listing 9-36. In this case, this command is executed on the Linux operating system, but the Windows or Mac command line would look the same.

This command is one line. The java command kicks off the installation of JRE in the operating system. The –cp parameter defines where to find CLASSPATH dependencies. Next, we need to specify the Java class to run. Of course, in this case, we want to run CommandLineJobRunner. Java expects the main method in this class, which is provided by SB. The last parameter is the name of the job we intend to run. After running this command, we can observe the same output as in Listing 9-30.

Execute Fat Spring Boot JAR from the Command Line

Another type of Java packaging type is often referred to as a fat JAR: all the JAR dependencies are packaged into the JAR file itself. This packaging can be easily achieved with Maven by using spring-boot-maven-plugin. The 0902-chunk-processing-generic-javaconfig example is configured this way, so when we build this project via the Maven command mvn clean install, we can execute the batch job via the command in Listing 9-37.

Again, java kicks off the JRE installation on the local machine. The –jar parameter defines which JAR to execute—in this case, the Maven artifact created for this example. Spring Boot automatically runs all Jobs covered by this SB application, because we use autoconfiguration here. Running this command results in the same output as in Listing 9-9.

Execute XML Job from the Command Line

When we use XML configuration, execution from the command line looks like Listing 9-38.

Instead of the main configuration class, we use the XML configuration file as a parameter for CommandLineJobRunner. The output after running this command again looks like Listing 9-9.

Execute Job with Parameters from the Command Line

Of course, it is possible to pass parameters into Job executed from the command line. This example is Spring Boot based. We use our common tea Tasklets with AddTeaWithParameter from the previous example in Listing 9-31, so that JobParameters can be applied. Listing 9-39 shows the batch configuration.

We create two Jobs for tea preparation. The first one, prepareTeaJob, uses only one AddTeaWithParameter Tasklet. The second one adds tea twice (has two steps with AddTeaWithParameter), so that we can have stronger tea. Two jobs are defined to highlight Spring Boot features to specify which Job should be run from the command line. Listing 9-40 shows the main class of this application.

Notice that this main class is different from the standard Spring Boot main class in Listing 9-8, because it passes command-line arguments from the main method to the SpringApplication.run() method as a second parameter. After we build the artifact of the fat JAR, we can execute the job or various jobs from the CLI. Listing 9-41 shows the CLI command whereby all the jobs will run with parameters passed from the command line.

In this command, we don’t specify a job name to execute. Therefore, Spring Boot by default executes all jobs. Running this command provides the output in Listing 9-42.

As you can see, both jobs were executed, and the addSugar parameter was applied for both of them. During execution of prepareStrongTeaJob, there is a log entry about duplicate step execution, as the SB framework detected duplicate use of addTeaStep. By default, each Step is meant to be executed only once for Job. In this case, we used the same Step in one Job twice. Step was executed anyway.

There are ways to highlight for the SB framework that we want to execute the same Step twice. We focus on them later in the chapter. This log message also mentions a restart; the restart mechanism is covered in upcoming sections.

Listing 9-43 shows the command indicating that only the specified job will run.

With the system property definition -Dspring.batch.job.names=prepareTeaJob, we specify that only prepareTeaJob should be executed by Spring Boot. Properties for the Spring Boot framework can be specified via the command line as well as via the application.properties configuration file shown earlier. This way, we can also specify various jobs when we enter their names as comma delimited. Listing 9-44 shows the output that confirms that only one job was executed with parameters.

JobRepository

We already mentioned that SB needs to have the JobRepository bean configured. This bean ensures that every JobExecution and StepExecution states (or other states SB uses) are persisted into a data store. This data store can be any type of SQL database Spring can use. This access is based on the DataSource bean defined in the Spring context.

SB uses a defined database schema to store its metadata and provides SQL scripts to create or drop the schema for the most commonly used relational databases. These schemas are located in the spring-batch-core library in the org/springframework/batch/core folder. So, for example, if we want to create a schema for the PostgreSQL database, we can use the SQL script classpath:/org/springframework/batch/core/schema-postgresql.sql to create the SB schema and the script classpath:/org/springframework/batch/core/schema-drop-postgresql.sql to erase it.

This mechanism is powerful, because the developer doesn’t need to care about persisting the state of batch execution. SB persists execution states automatically, so it is easy to handle scenarios in which, for example, Job is killed in the middle of execution and we want to continue execution from the point where the DB execution was left behind.

Configuring JobRepository with XML Configuration

This example does not use Spring Boot. Listing 9-45 shows the SB bean configuration needed for this example.

We use four namespaces that the Spring and SB frameworks provide: beans, jdbc, context, and batch. The default namespace here is beans, as we don’t have a lot of batch XML tags. <context:component-scan>, transactionManager, and jobLauncher were already covered in previous examples. The component scan registers all the beans from the defined package into the Spring context. The transaction manager is needed to handle transactions, and JobLauncher is used for execution of batch jobs.

The <jdbc:embedded-database> XML tag configures the in-memory database we use in this example. This data source will be handling only storage of SB states, so we create a schema for it via the SQL script classpath:/org/springframework/batch/core/schema-h2.sql.

JdbcTemplate enables us to easily query the SB JobRepository data store. Finally, we create the JobRepository bean via the SB XML tag <batch:job-repository>. We need to specify the mandatory attributes transaction-manager and data-source, where we use the bean instances created earlier in this XML configuration. We can also define a prefix for all the SB tables in the database, which is handy for avoiding possible DB table name conflicts. The last attribute we use is the length of VARCHAR types in the DB. This can be handy if our Job or Step names are too long.

For SB Job configuration, we reuse the same prepareTeaJob flow from Listing 9-19. Listing 9-46 shows the main class of this example.

This main class includes both XML configurations we mentioned. Creation of the Spring context is done via plain Spring constructs, which is in this case the AnnotationConfigApplicationContext constructor, and context configuration is defined by this class also. After the context instance is created, we retrieve the JobLauncher and Job instances to execute the job.

When the job is finished and the exit status is logged, we retrieve the JdbcTemplate instance from the Spring context, so that we can query the H2 in-memory database. In this case, we read how many job executions happened so far against this JobRepository.

Image Note  Structure of the SB metadata schema is beyond the scope of this book. Curious readers can find the full schema metadata information in the SB reference documentation at http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#metaDataSchema.

After we run this main class, we can observe the output in Listing 9-47.

Configuring JobRepository with Java Configuration

The Java configuration of JobRepository looks like Listing 9-48.

This Java configuration is an exact mirror of the XML configuration from the previous example (Listing 9-45). We just need to replace the <jdbc:embedded-database> and <batch:job-repository> XML tags with explicit construction of these beans and register them with the @Bean annotation. For creation of the H2 in-memory database, we use EmbeddedDatabaseBuilder. For the JobRepository instance creation, we use the JobRepositoryFactoryBean.getObject() call. Other beans are exact Java counterparts of the <bean> XML tag.

For this example, we also use BatchConfiguration from Listing 9-21, which creates prepareTeaJob with three common tea Steps. For the main class, we use a similar class as for the previous example in Listing 9-46. The only difference is replacement of the @ImportResource annotation with the @ComponentScan annotation, because we don’t have XML configuration files on the classpath. Instead, we have the Java configuration classes BatchConfiguration and BatchBeansConfiguration, which are component scanned by this annotation. As this is a pretty straightforward main class, we won’t list it. After running it, we can see the same output as for the previous example (Listing 9-47).

Stateful Job and Step Execution

There is often a need to pass state between Jobs or Steps, which can be any Java object. For example, one Step or Job might be processing records, and a subsequent Job or Step needs to send notification of how many records were processed.

Normally, Java developers would create some kind of in-memory cache, which needs to be thread safe. But fortunately, SB provides various mechanisms for passing state in a clean and thread-safe manner.

ExecutionContext

The first option for state transition can be done via the ExecutionContext class. This class encapsulates a map of type Map<String, Object>. So we are able to store any type of state that needs to be transferred between steps or chunk-oriented item handlers. This can be handy if we need to store partial results during batch processing. But, of course, this instance needs to be somehow injected into the place we need to use it. Luckily, SB provides a lot of ways to access ExecutionContext from any part of the SB application.

As we mentioned, when Step is executed, a StepExecution instance is created, and when Job is executed, an instance of JobExecution is created. Via StepExecution.getJobExecution(), we can access the JobExecution instance, and via JobExecution.getExecutionContext(), we can access the ExecutionContext instance.

The StepExecution instance can be injected into any chunk-oriented ItemProcessor, ItemWriter, and ItemReader via the initialization method annotated with @BeforeStep. For Tasklet, there is an even easier mechanism, whereby SB injects the ChunkContext instance as a second parameter of Tasklet.execute(). The StepExecution instance can be accessed and then ChunkContext.getStepContext().getStepExecution().

So via all these context instances, we can access the ExecutionContext instance in any part of the SB flow.

Accessing ExecutionContext from Tasklet Example

Listing 9-49 shows Tasklet, which stores state into the ExecutionContext instance. In this example, we are counting how many times we prepared tea in the previous Job runs. Based on this information, we amend our behavior of one Step. Therefore, our tea Tasklets will have the suffix WithCounter.

Every time this Tasklet is executed, it increments the teaCount value stored in the ExecutionContext instance that is retrieved via the chunkContext.getStepContext().getStepExecution().getJobExecution().getExecutionContext() call. Listing 9-50 shows how this state is used.

This is the Tasklet where the state from ExecutionContext will be used to amend the behavior of the step. Every time SB executes this Tasklet, it also retrieves ExecutionContext of the Job via calling chunkContext.getStepContext().getStepExecution().getJobExecution().getExecutionContext() to look for the teaCount value stored in it. If teaCount is more than 2, it simulates execution with a different message to highlight its reaction to the state in ExecutionContext. Listing 9-51 shows the batch configuration of this example.

It’s obvious that one difference from the previous example is use of our stateful Tasklets. The second difference is enabling steps to be executed various times with the same JobParameters. If we didn’t allow starting the Step various times, we couldn’t highlight how we transition state between various Job runs. This feature is discussed later in this chapter. Listing 9-52 shows the main class of this example.

To highlight the stateful feature we included, we execute the batch job three times. As you may remember, each job increments teaCount, and in the third execution, the Boil Water step simulates the action with a different message. Listing 9-53 shows the output of this example.

Notice that the third job run prints the desired stateful message when we add water to our tea.

Accessing ExecutionContext in Chunk-Oriented Processing

Listing 9-54 shows ItemWriter, which is stateful.

This ItemWriter uses the @BeforeStep annotation to retrieve ExecutionContext via the StepExecution instance and to initiate the map entry chunkCount to 0. During processing, it writes only the first record of the even chunk it is writing and increases chunkCount to maintain state in ExecutionContext. For the odd chunk, it writes all items. Listing 9-55 shows the batch configuration.

This configuration was already shown in the chunk-oriented example, but in this case we use our stateful writer SimpleRecordWriterDiscard. The main class in this example is the common main batch class we reuse across examples. Listing 9-56 shows the output when we run it.

Notice that the chunk size is 4. As expected, only the first record of each even chunk is written alongside all odd chunk records. Therefore, records 1, 2, 3, 9, 10, and 11 are missing in the output.

Batch Scopes

Every Spring developer should be familiar with the concept of bean scopes. Each Spring bean has by default a singleton scope, which means there is only one instance of that in the Spring context. Every time we inject this bean into our code, Spring provides the same object.

Another useful scope is request scope, in which the bean uses the @Scope(WebApplicationContext.SCOPE_REQUEST) annotation. In this case, a new instance is created when the web request is served by our application, and Spring creates a new instance of this bean for each request. So when we inject the bean into our code, we will get a different instance for each request. This mechanism can be useful for sharing state in our code within boundaries of the request thread. For example, we can store some information into the local variable of this bean in the controller and retrieve this information in the repository.

SB provides similar mechanisms for Job and Step. We can create beans with StepScope or JobScope, and SB makes sure that every new StepExecution will have a fresh instance of a bean with StepScope. Similarly for different JobExecutions, SB will inject fresh instances of a bean with JobScope.

To define StepScope, we can annotate beans with @StepScope or via the standard Spring annotation @Scope(value = "step", proxyMode = ScopedProxyMode.TARGET_CLASS). The @JobScope annotation defines the JobScope. It is also a convenience shortcut for @Scope(value = "job", proxyMode = ScopedProxyMode.TARGET_CLASS). With XML configuration, we can define the scope attribute with the job or scope values of the XML tag <bean>.

These scopes are also useful for late binding of references from the SB context. This can be done by using the placeholders #{...}.

Batch Scopes XML Example

In this example, we prepare tea and process records. Listing 9-57 shows the class, which will be used with StepScope.

This class will be used as a bean with StepScope. This configuration will be applied in the XML context we show later in this example. Lombok’s annotation @Getter specifies that all fields of this class will be accessible via getters, and the @Setter annotation specifies that only the readCount field will have a setter. The purpose of this class is to configure how many records can be read, and for this we use the countToProcess variable. The second variable, readCount, is used to track how many records were actually read. Listing 9-58 presents StatefulRecordReader.

This Spring component autowires the common bean ReadRepository, but also the step-scoped bean ReadCountRestricter shown in the previous listing. In the read() method, we retrieve readCount from the ReadCountRestricter bean and quit further reading if we reach the desired read count (readCountRestricter.getCountToProcess()). Of course, if we don’t reach this limit, we continue reading new records and increment the read counter. Notice that if we run this step twice and configure this bean with StepScope, SB will inject a new instance of ReadCountRestricter. So it effectively resets the counter for each job.

Next we also include the JobScope bean in this example, which also will be configured in XML. But first, Listing 9-59 shows the class we will use for storing state while processing records.

This class stores the count of jobs executed. Lombok’s @Data annotation will generate getters and setters for us. This class will be used for beans with JobScope. Listing 9-60 covers ItemWriter by using the WrittenRecordsCounter bean.

This ItemWriter is autowired alongside our common WriteRepository and WrittenRecordsCounter. It is used to track the number of written records. If we set WrittenRecordsCounter in the XML configuration to have the JobScope bean, SB will create a new instance for each job and effectively reset the counter. Listing 9-61 shows Tasklet using the WrittenRecordsCounter bean.

This Tasklet is similar to our common one for tea preparation, BoilWater. In this case, we also inject the WrittenRecordsCounter bean, so that we can output a specific message and highlight how the JobScope bean has transferred state between two Steps of the same Job. Listing 9-62 shows the XML configuration of the SB beans.

The ReadCountRestricter bean is configured with step scope. Its parameter is initiated via late binding from the job parameter recordCountToProcess. This late binding wouldn’t work with a singleton scope, for example, because in that case Spring most probably would create this bean during initialization of the Spring context, but JobParameters are not configured at that time.

WrittenRecordsCounter uses the job scope. When we use XML configuration, we need to enable batch scopes by configuring them as beans with the proxyTargetClass enabled. If we didn’t configure these beans, Spring wouldn’t know about SB scopes at all.

Listing 9-63 shows the XML configuration of the SB flow.

This job will be called combinedJob, because we combine the processing of records and tea preparation steps into one Job. Listing 9-64 covers the main class of this example.

This main class uses the XML configuration we listed and executes combinedJob twice to highlight the resetting of counters and the JobScope and StepScope features. Both jobs are executed with a different value for the readCountToProcess parameter. When we run this main class, we see the output in Listing 9-65.

The first job execution is limited to reading three records and processing them. This is recorded into the StepScope bean ReadCountRestricter. Notice that when the second Job is started, the counter stored in this bean is reset and a new value from JobParameter is retrieved to a fresh instance of this bean.

An example of a JobScope feature is shown in a transition of state between simpleRecordsStep and boilWaterStep. You can see that when we started preparing tea, we knew how many records were written in the previous step. At the same time, this state was reset between two Job runs.

Figure 9-6 shows the sequence diagram of state transitions during execution of combinedJob.

9781484207949_Fig09-06.jpg

Figure 9-6. Sequence diagram of XML batch scopes example

To fit this diagram on the page, we skipped the stateless steps and SimpleRecordProcessor and added notes when they should be executed.

The ReadCountRestricter instance is used only during simpleRecordsStep to store the number of records we already read and decide whether we need to read another one. So it is used to transition the state between various statefulRecordReader.read() executions. This instance is also active only during this step execution. Each new execution of this step will create a new instance.

The WrittenRecordsCounter instance is used for the transition state (the written records count) between the steps simpleRecordsStep and boilWaterStep. This instance is abandoned after the prepareTeaJob execution is over. So a new execution of this Job will get a fresh instance from the Spring context.

Batch Scopes Java Example

Listing 9-66 shows an example of a bean configured with StepScope and a Java annotation.

The @StepScope annotation defines the correct scope for this bean. Therefore, we can use late binding to JobParameter recordCountToProcess in the constructor injection. Listing 9-67 shows JobScope defined with a Java annotation.

In this case, the @JobScope annotation does our trick. Listing 9-68 shows the Java batch configuration.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.9.124