Chapter 3. Sample Job

This book is designed to not only explain how the many features of Spring Batch work but also demonstrate them in detail. Each chapter includes a number of examples that show how each feature works. However, examples designed to communicate individual concepts and techniques may not be the best for demonstrating how those techniques work together in a real-world example. So, in Chapter 10 you create a sample application that is intended to emulate a real-world scenario.

The scenario I chose is simplified: a domain you can easily understand but that provides sufficient complexity so that using Spring Batch makes sense. Bank statements are an example of common batch processing. Run nightly, these processes generate statements based on the previous month's transactions. The example is a derivative of the standard bank statement: a brokerage statement. The brokerage statement batch process shows how you can use the following features of Spring Batch together to accomplish the result:

Various input and output options: Among the most significant features of Spring Batch are the well-abstracted options for reading and writing from a variety of sources. The brokerage statements obtains input from flat files, a database, and a web service. On the output side, you write to databases as well as flat files. A variety of readers and writers are utilized.

Error handling: The worst part about maintaining batch processes is that when they break, it's typically at 2:00 a.m., and you're the one getting the phone call to fix the problem. Because of this, robust error handling is a must. The example statement process covers a number of different scenarios including logging, skipping records with errors, and retry logic.

Scalability: In the real world, batch processes need to be able to accommodate large amounts of data. Later in this book, you use the scalability features of Spring Batch to tune the batch process so it can process literally millions of customers.

In order to build our batch job we will want a set of requirements to work from. Since we will be using user stories to define our requirements, we will take a look at the agile development process as a whole in the next section.

Understanding Agile Development

Before this chapter digs into the individual requirements of the batch process you develop in Chapter 10, let's spend a little time going over the approach you use to do so. A lot has been said in our industry about various agile processes; so instead of banking on any previous knowledge you may have of the subject, let's start by establishing a base of what agile and the development process will mean for this book.

The agile process has 12 tenets that virtually all of its variants prescribe. They are as follows:

  • Customer satisfaction comes from quick delivery of working software.

  • Change is welcome regardless of the stage of development.

  • Deliver working software frequently.

  • Business and development must work hand in hand daily.

  • Build projects with motivated teams. Give them the tools and trust them to get the job done.

  • Face-to-face communication is the most effective form.

  • Working software is the number-one measure of success.

  • Strive for sustainable development. All members of the team should be able to maintain the pace of development indefinitely.

  • Continue to strive for technical excellence and good design.

  • Minimizing waste by eliminating unnecessary work.

  • Self-organizing teams generate the best requirements, architectures, and designs.

  • At regular intervals, have the team reflect to determine how to improve.

It doesn't matter if you're using Extreme Programming (XP), Scrum, or any other currently hip variant. The point is that these dozen tenets still apply.

Notice that not all of them will necessarily apply in your case. It's pretty hard to work face to face with a book. You'll probably be working by yourself through the examples, so the aspects of team motivation don't exactly apply either. However, there are pieces that do apply. An example is quick delivery of working software. This will drive you through out the book. You'll accomplish it by building small pieces of the application, validating that they work with unit tests, and then adding onto them.

Even with the exceptions, the tenets of agile provide a solid framework for any development project, and this book applies as many of them as possible. Let's get started looking at how they're applied by examining the way you document the requirements for the sample job: user stories.

Capturing Requirements with User Stories

User stories are the agile method for documenting requirements. Written as a customer's take on what the application should do, a story's goal is to communicate the how a user will interact with the system and document testable results of that interaction. A user story has three main parts:

  • The title: The title should be a simple and concise statement of what the story is about. Load transaction file. Calculate fee tier. Generate print file. All of these are good examples of story titles. You notice that these titles aren't GUI specific. Just because you don't have a GUI doesn't mean you can't have interactions between users. In this case, the user is the batch process you're documenting or any external system you interface with.

  • The narrative: This is a short description of the interaction you're documenting, written from the perspective of the user. Typically, the format is something like "Given the situation Y, X does something, and something else happens." You see in the upcoming sections how to approach stories for batch processes (given that they're purely technical in nature).

  • Acceptance criteria: The acceptance criteria are testable requirements that can be used to identify when a story is complete. The important word in the previous statement is testable. In order for an acceptance criterion to be useful, it must be able to be verified in some way. These aren't subjective requirements but hard items that the developer can use to say "Yes it does do that" that or "No it doesn't."

Let's look at a user story for a universal remote control as an example:

  • Title: Turn on Television

  • Narrative: As a user, with the television, receiver, and cable box off, I will be able to press the power button on my universal remote. The remote will then power on the television, receiver, and cable box and configure them to view a television show.

  • Acceptance criteria:

    • Have a power button on the universal remote.

    • When the user presses the power button, the following will occur:

      1. The television will power on.

      2. The AV receiver will power on.

      3. The cable box will power on.

      4. The cable box will be set to channel 187.

      5. The AV receiver will be set to the SAT input.

      6. The television will be set to the Video 1 input.

The Turn on Television user story begins with a title—Turn on Television—that is short and descriptive. It continues with a narrative. In this case, the narrative provides a description of what happens when the user presses the power button. Finally, the acceptance criteria list the testable requirements for the developers and QA. Notice that each criterion is something the developers can easily check: they can look at their developed product and say yes or no, what they wrote does or doesn't do what the criteria state.

User stories mark the beginning of the development cycle. Let's continue by looking at a few of the other tools used over the rest of the cycle.

Capturing Design with Test-Driven Development

Test-driven development (TDD) is another agile practice. When using TDD, a developer first writes a test that fails and then implements the code to make the test pass. Designed to require that developers think about what they're trying to code before they code it, TDD (also called test-first development) has been proven to make developers more productive, use their debuggers less, and end up with cleaner code.

Another advantage of TDD is that tests serve as executable documentation. Unlike user stories or other forms of documentation that become stale due to lack of maintenance, automated tests are always updated as part of the ongoing maintenance of the code. If you want to understand how a piece of code is intended to work, you can look at the unit tests for a complete picture of the scenarios in which the developers intended their code to be used.

Although TDD has a number of positives, you won't use it much in this book. It's a great tool for development, but it isn't the best for explaining how things work. However, Chapter 12 looks at testing of all types, from unit testing to functional testing, using open source tools including JUnit, Mockito, and the testing additions in Spring.

Using a Source-Control System

In Chapter 2, you took a quick peek at source control when you used Git to retrieve the source code for Spring Batch. Although it isn't a requirement by any means, you're strongly encouraged to use a source-control system for all your development. Whether you choose to set up a central Subversion repository or use Git locally, the features that source control provides are essential for productive programming.

You're probably thinking, "Why would I use source control for code that I'm going to throw away while I'm learning?" That is the strongest reason I can think of to use it. By using a version-control system, you give yourself a safety net to try things. Commit your working code; try something that may not work. If it does, commit the new revision. If not, roll back to the previous revision with no harm done. Think about the last time you learned a new technology and did so without version control. I'm sure there were times when you coded your way down a path that didn't pan out and were then stuck to debug your way out of it because you didn't have a previously working copy. Save yourself the headache and allow yourself to make mistakes in a controlled environment by using version control.

Working with a True Development Environment

There are many other pieces to development in an agile environment. Get yourself a good IDE. Because this book is purposely written to be IDE agnostic, it won't go into pros and cons of each. However, be sure you have a good one, and learn it well, including the keyboard shortcuts.

Although spending a lot of time setting up a continuous integration environment may not make sense for you while you learn a given technology, it may be worth setting one up to use in general for your personal development. You never know when that widget you're developing on the side will be the next big thing, and you'd hate to have to go back and set up source control and continuous integration, etc when things are starting to get exciting. A few good continuous integration systems are available for free, but I strongly recommend Hudson (or its brother Jenkins). Both of them are easy to use and highly extendable, so you can configure all kinds of additional functionality including things like integrating with Sonar and other code-analysis tools and executing automated functional tests.

Understanding the Requirements of the Statement Job

Now that you've seen the pieces of the development process you're encouraged to use as you learn Spring Batch, let's look at what you'll develop in this book. Figure 3-1 shows what you expect to get in the mail from your stockbroker each quarter as your brokerage account statement.

Brokerage statement, formatted and printed on letterhead

Figure 3.1. Brokerage statement, formatted and printed on letterhead

If you break down how the statement is created, there are really two pieces to it. The first is nothing more than a pretty piece of paper on which the second piece is printed. It's the second piece, shown in Figure 3-2, that you create in this book.

Plain-text brokerage statement

Figure 3.2. Plain-text brokerage statement

Typically, statements are created as follow. A batch process creates a print file consisting of little more than text. That print file is then sent to a printer that prints the text onto the decorated paper, producing the final statement. The print file is the piece you create using Spring Batch. Your batch process will perform the following functions:

  1. Import a file of customer information and related transactions.

  2. Retrieve from a web service the closing stock prices for all the stocks included in the database.

  3. Import the previously downloaded stock prices into the database.

  4. Calculate the pricing level for each of the accounts.

  5. Calculate the transaction fees for each transaction based on the level calculated in the previous step.

  6. Print the file for the brokerage account for the past month.

Let's look at what each of these features entails. Your job is provided with a customer-transaction flat file that consists of information about a customer and their transactions for the month. Your job updates existing customer information and adds their transactions to the database. When the transactions have been imported, the job obtains the latest prices for each of the stocks in the database from a web service, in order to calculate each account's current value. The job imports the downloaded prices into the database.

After the initial imports are complete, your job can begin calculating transaction fees. The brokerage makes its money by charging a fee for each transaction it does. These fees are based on how many transactions a customer has in a month. The more transactions a customer has, the less they're charged per transaction. The first step in calculating the transaction fees is to determine what level or tier the user falls into; then you can calculate the price for the customer's transactions. When all the calculations have been completed, you can generate the user's monthly statement.

This list of features is intended to provide a complete view into how Spring Batch is used in a real-world problem. Throughout the book, you learn about the features Spring Batch provides to help you develop batch processes like the one required for this scenario. In Chapter 10, you implement the batch job to meet the requirements outlined in the following user stories:

Import Transactions: As the batch process, I will import the customer information and their related transactions into the database for future processing. Acceptance criteria:

  • The batch job will import a predefined customer/transaction file into a database table.

  • After the file has been imported, it will be deleted.

  • The customer/transaction file will have two record formats. The first will be to identify the customer the subsequent transactions belong to. The second will be the individual transaction records.

  • The format for the customer record is a comma-delimited record of the following fields:

    Name

    Required

    Format

    Customer Tax ID

    True

    d{9}

    Customer First Name

    False

    w+

    Customer Last Name

    False

    w+

    Customer Address 1

    False

    w+

    Customer City

    False

    w+

    Customer State

    False

    [A-Z]{2}

    Customer Zip

    False

    d{4}

    Customer Account Number

    False

    d{16}

  • A customer record will look like the following:

    205866465,Joshua,Thompson,3708 Park,Fairview,LA,58517,3276793917668488
  • The format for the transaction records is a comma-delimited record of the following fields:

    Name

    Required

    Format

    Customer Account Number

    True

    d{16}

    Stock Symbol

    True

    w+

    Quantity True

     

    d+

    Price True

     

    d+.d{2}

    Transaction Timestamp

    True MMDDYYYY

    hh:mm:ss.ss

  • An transaction record looks like the following:

    3276793917668488,KSS,5767,7074247,2011-04-02 07:00:08
  • All transactions will be imported as new transactions.

  • An error file will be created with any customer records that aren't valid.

  • Any transaction records that aren't valid will be written to the error file with the customer record

Get Stocks Closing Price: As the batch process, at the prescheduled execution time, I will query the Yahoo stock web service to obtain the closing prices of all stocks held over the course of the previous month by our customers. I will build a file with this data for future import. Acceptance criteria

  • The process will output a file each time it's run.

  • The file will consist of one record per stock symbol.

  • Each record in the file will have the following fields comma delimited:

    Name

    Required

    Format

    Stock Symbol

    True

    w+

    Closing Price

    True

    d+.d{,2}

  • The file of stock quotes will be obtained from the URL http://download.finance.yahoo.com/d/quotes.csv?s=<QUOTES>&f=sl1, where <QUOTES> is a list of ticker symbols delimited by pluses (+) and sl1 indicates that I want the stock ticker and the last price traded.[7]

  • An example record of what is returned using the URL http://download.finance.yahoo.com/d/quotes.csv?s=HD&f=sl1 is: "HD",31.46.

Import Stock Prices: As the batch process, when I receive the stock price file, I will import the file into the database for future processing. Acceptance criteria:

  • The process will read the file that was downloaded by a previous step in the job.

  • The stock prices for each of the stocks will be stored in the database for reference by each transaction.

  • After the file has been successfully imported, it will be deleted.

  • The record format of the file can be found in the story Get Stocks Closing Price.

  • Any records that are ill formed will be logged in a separate error file for future analysis.

Calculate Pricing Tiers: As the batch process, after all input has been imported, I will calculate the pricing tier each customer falls into and store it for future use. Acceptance criteria:

  • The process will calculate the price per trade based on the number of trades the customer has made over the course of the month.

  • Each tier will be determined by the following thresholds:

    Tier

    Trades

    I <=

    10

    II <=

    100

    III <=1,

    000

    IV >

    10,000

  • The tier value will be stored in relation to the customer for future fee calculations.

Calculate Fee Per Transaction: As the batch process, after I have completed calculating pricing tiers, I will calculate a brokerage fee per trade that the customer will be charged. Acceptance criteria:

  • The process will calculate a fee for each transaction based on the tier the customer is in (as calculated in the Calculate Pricing Tiers story).

  • The formula for calculating the price per trade is as follows:

    Tier

    Formula

    I

    $9 + .1% of purchase

    II $3

     

    III $2

     

    IV $1

     

Print Account Summary: As the batch process, after all calculations have been completed, I will print out a summary for each customer. This summary will provide an overview of the customer's account and a breakdown of what makes up the total value of their portfolio. Acceptance criteria:

  • The process will generate a single file for each customer.

  • The summary will begin with a line that states the following, fully justified

    Your Account Summary                  Statement Period:<BEGIN_DATE> to <END_DATE>
  • where BEGIN_DATE is the first calendar date of the previous month and END_DATE is the last of the previous month.

  • After the summary title, there will be a single line item for each security type (securities and cash) and the current value of it for the account.

  • After each of the detail items, a total account value will be printed. Following is an example of this section:

    Your Account Summary                    Statement Period: 07/01/2010 to 09/30/2010
    Market Value of Current Securities
    $21,680.50
    Current Cash Balance                                                   $254,953.23
    Total Account Value                                                    $276,633.73

Print Account Detail: As the batch process, after each account summary I will print the detail makeup of each account. The account detail will provide the customer with a detailed look into the makeup of their account and how their investments are doing. Acceptance criteria:

  • The account detail will be appended onto each customer's account summary.

  • The detail will begin with a header stating "Account Detail," left justified.

  • On a new line, the cash balance of the account will be specified.

  • Below the cash balance, a header stating "Securities" will be displayed, left justified.

  • For each stock held by the customer, the following fields will be displayed:

    Name

    Required

    Format

    Stock Symbol

    True

    w+

    Quantity True

     

    d+

    Price True

     

    d+.d{2}

    Total Value

    True

    Quantity * price in dollar format

  • Below is an example of this section.

    Account Detail
    Cash                                                             $245,953.23
    
    Securities
            SHLD    100    $71.98  $7,198.00
            CME     50     289.65  14482.50
    
    Total Account Value    $276,633.73

Print Statement Header: As the batch process, at the top of each page I will print a header. This will provide generic information about the account, the customer, and the brokerage. Acceptance criteria:

  • The header is all static text except for the customer's address and account number.

  • Following is an example of the header, where the Michael Minella name and address are the customer's name and address, and the account number is the customer's account number:

    Brokerage Account Statement
    
    Apress Investment Company                    Customer Service Number
    1060 West Addison St.                        (800) 867-5309
    Chicago, IL 60613                            Available 24/7
    
    Michael Minella
    1313 Mockingbird Lane
    Chicago, IL 60606
    
    Account Number      10938398571278401298

    That does it for the requirements. If your head is spinning about now, that's ok. In the next section, you begin to outline how to tackle this statement process with Spring Batch. Then, over the rest of this book, you learn how to implement the various pieces required to make it work.

Designing a Batch Job

As stated before, the goal of this project is to take a real-world example and work through it using the features that Spring Batch provides to create a robust, maintainable solution. In order to accomplish this goal, the example includes elements that may seem a bit complex right now, such as headers, multiple file format imports, and complex output including subheadings. The reason is that Spring Batch provides facilities exactly for these features. Let's dig into how you structure this batch process by outlining the job and describing its steps.

Job Description

In order to implement the statement-generation process, you build a single job with six steps. Figure 3-3 shows the flow of the batch job for this process, and the following sections describe the steps.

Stock statement jobflow

Figure 3.3. Stock statement jobflow

Importing Customer Transaction Data

To start the job, you begin by importing the customer and transaction data. Contained in a flat file, this data has a complex format consisting of two record types. The first record type is that of the customer, consisting of the customer's name, address, and account information. The second record type consists of detailed information about each transaction, including the stock ticker, the price paid, the quantity purchased or sold, and the timestamp when the transaction occurred. Using Spring Batch's ability to read multiline records allows you to process this file with minimal coding. You write a data access object (DAO) for the JDBC persistence of the imported data, as shown in Listing 3-1.

Example 3.1. Customer/Transaction Input File

392041928,William,Robinson,9764 Jeopardy Lane,Chicago,IL,60606
HD,31.09,200,08:38:05
WMT,53.38,500,09:25:55
ADI,35.96,-300,10:56:10
REGN,29.53,-500,10:56:22
938472047,Robert,Johnson,1060 Addison St,Chicago,IL,60657
CABN,0.890,10000,14:52:15
NUAN,17.11,15000,15:02:45

Retrieving Stock Closing Prices

After you've imported the customer's information and transactions, you move on to obtaining the stock price information. This is a simple step that retrieves all the closing values for the stocks that have been traded over the previous month. To do this, you create a tasklet (as you did last chapter for the Hello, World batch process) to call the Yahoo web service specified in the requirements, and you download a CSV with the required stock data. This step writes the output to a file similar to that in Listing 3-2, for the next step to process.

Example 3.2. Stock Closing Price Input File

SHLD,71.98
CME,289.65
GOOG,590.83
F,16.28

Importing Stock Prices into Database

This step reads the file and imports the data into the database. This step showcases the strengths of the declarative I/O provided by Spring Batch. Both the input and output of this job require no code on your part. Spring Batch provides the ability to read the CSV file you downloaded in the previous step via stock components of the framework as well as to update the database.

You may wonder why you don't import the data directly. The reason is error handling. You're importing data that was provided to you by a third-party source. Because you can't be sure of the quality of the data, you need to be able to handle any errors that may occur during the import. By writing the data to a file for a future step to process, you can restart the step without having to re-download the stock prices.

Calculating Transaction Fee Tiers

Up to this point, you haven't had to do any real processing of the data you're reading and writing. All you've done is pipe data from a file to a database (with some validation for good measure). When you're finished importing the required data, you begin doing the required calculations. Your brokerage company charges fees based on how much a customer trades. The more the customer trades, the less they're charged per trade. The amounts customers are charged are assigned via tiers; each tier is defined by the number of trades performed over the previous month and has a dollar amount associated with it.

In this step, you introduce an item processor between the reader and writer to determine the tier to which the customer belongs. You declare the reader via XML to load the customer's trade information and the writer to update the customer's account in the same manner.

Calculating Transaction Fees

When you've determined the tier for each customer, you can calculate the fee for each trade. In the previous step, you processed records at the customer level, with each customer being assigned a tier. In this step, you process records at the individual transaction level. As you can imagine, you process many more records in this step than in any previous step; you examine this step in further detail later, when the book talks about scalability options. However, to start, this step looks almost exactly like the previous one, but with different SQL for the reader, different logic in the item processor, and yet another JdbcItemWriter for the writer.

Generating Customer Monthly Statements

The last step seems to be the most complex—but as you know, looks can be deceiving. This step involves the generation of the statements themselves. It demonstrates some of the beauty of applying decoupled solutions to batch problems. By providing a custom-coded formatter, you can do virtually all the work with a single simple class. This step also uses the callbacks for the headers.

All of this sounds great in theory but leaves a lot of questions to be answered. That's good. You'll spend the rest of the book working through how these features are implemented in the processes as well as examining things like exception handling and restart/retry logic. One final item you should be familiar with before you move on, though, is the data model. That will help clear the air regarding how this system is structured. Let's take a look.

Understanding the Data Model

You've seen all the different pieces of the job you create throughout this book. Let's move on to the last piece of the puzzle before you get into actual development. Batch processes are data driven. Because there are no user interfaces, the various datastores end up being the only external interaction the process has. This section looks at the data model used for the sample application.

Figure 3-4 outlines the application-specific tables for this batch process. To be clear, this diagram doesn't encompass all the tables required for this batch job to run. Chapter 2 took a brief look at the tables Spring Batch uses in the job repository. All of those tables will exist in addition to these in your database. Because it isn't uncommon to deploy the batch schema separately, and you reviewed it in the last chapter, I've chosen to leave it out of Figure 3-4.

Sample application data model

Figure 3.4. Sample application data model

For the batch application, you have four tables: Customer, Account, Transaction, and Ticker. When you look at the data in the tables, notice that you aren't storing all the required fields to generate the statement. There are fields (such as the totals in the account summary) that you calculate during processing. Other than that, the data model should appear relatively straightforward:

  • Customer: This record contains all the customer-specific information, including name and tax identification number.

  • Account. For every customer, an account is maintained. For your purposes, each account has a number and a running cash balance from which fees are deducted as needed. The customer's transaction fee tier is also stored at this level.

  • Transaction. Each trade has a corresponding record in the Transactions table. The data here is used to determine the current state of the account (how many shares are held, and so on).

  • Ticker. For each stock that has been traded by the brokerage company's customers, a record exists in this table containing the ticker and the most recent closing price for the stock.

Summary

This chapter discussed the agile development process and how you can apply it to batch development. The chapter continued along those lines by defining requirements via user stories for the sample application you build throughout the course of this book. From this point, the book switches from the "what" and "why" of Spring Batch to the "how."

In the next chapter, you take a deep dive into Spring Batch's concepts of jobs and steps and look at a number of other-specific examples.



[7] You can find more information about this web service at www.gummy-stuff.org/Yahoo-data.htm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.181.47