You are running a program that takes a long time to execute, and you need to present the user with an estimated time until completion.
Use Commons Math’s
SimpleRegression
and Commons Lang’s StopWatch
to
create a
ProcessEstimator
class that can be used to predict when a
particular program will be finished. Your program needs to process a
number of records, and this program could take a few hours to finish.
You would like to provide some feedback, and, if you are confident
that each record will take roughly the same amount of time, you can
use SimpleRegression
’s slope and
intercept to estimate the time when all records will be processed.
Example 8-1 defines the
ProcessEstimator
class that combines the power of
StopWatch
and ProcessEstimator
to estimate the time remaining in a process.
Example 8-1. ProcessEstimator to estimate time of program execution
package com.discursive.jccook.math.timeestimate; import org.apache.commons.lang.time.StopWatch; import org.apache.commons.math.stat.multivariate.SimpleRegression; public class ProcessEstimator { private SimpleRegression regression = new SimpleRegression( ); private StopWatch stopWatch = new StopWatch( ); // Total number of units private int units = 0; // Number of units completed private int completed = 0; // Sample rate for regression private int sampleRate = 1; public ProcessEstimator( int numUnits, int sampleRate ) { this.units = numUnits; this.sampleRate = sampleRate; } public void start( ) { stopWatch.start( ); } public void stop( ) { stopWatch.stop( ); } public void unitCompleted( ) { completed++; if( completed % sampleRate == 0 ) { long now = System.currentTimeMillis( ); regression.addData( units - completed, stopWatch.getTime( )); } } public long projectedFinish( ) { return (long) regression.getIntercept( ); } public long getTimeSpent( ) { return stopWatch.getTime( ); } public long projectedTimeRemaining( ) { long timeRemaining = projectedFinish( ) - getTimeSpent( ); return timeRemaining; } public int getUnits( ) { return units; } public int getCompleted( ) { return completed; } }
ProcessEstimator
has a constructor that takes the
number of records to process and the sample rate to measure progress.
With 10,000 records to process and a sample of 100, the
SimpleRegression
will add a data point of units
remaining versus time elapsed after every 100 records. As the program
continues to execute, projectedTimeRemaining( )
will return an updated estimation of time remaining by retrieving the
y-intercept from SimpleRegression
and subtracting
the time already spent in execution. The y-intercept from
SimpleRegression
represents the y value when x
equals zero, where x is the number of records remaining; as x
decreases, y increases, and y represents the total time elapsed to
process all records.
The ProcessEstimationExample
in Example 8-2 uses the ProcessEstimator
to estimate the time remaining while calling the
performLengthyProcess( )
method 10,000 times.
Example 8-2. An example using the ProcessEstimator
package com.discursive.jccook.math.timeestimate; import org.apache.commons.lang.math.RandomUtils; public class ProcessEstimationExample { private ProcessEstimator estimate; public static void main(String[] args) { ProcessEstimationExample example = new ProcessEstimationExample( ); example.begin( ); } public void begin( ) { estimate = new ProcessEstimator( 10000, 100 ); estimate.start( ); for( int i = 0; i < 10000; i++ ) { // Print status every 1000 items printStatus(i); performLengthyProcess( ); estimate.unitCompleted( ); } estimate.stop( ); System.out.println( "Completed " + estimate.getUnits( ) + " in " + Math.round( estimate.getTimeSpent( ) / 1000 ) + " seconds." ); } private void printStatus(int i) { if( i % 1000 == 0 ) { System.out.println( "Completed: " + estimate.getCompleted( ) + " of " + estimate.getUnits( ) ); System.out.println( " Time Spent: " + Math.round( estimate.getTimeSpent( ) / 1000) + " sec" + ", Time Remaining: " + Math.round( estimate.projectedTimeRemaining( ) / 1000) + " sec" ); } } private void performLengthyProcess( ) { try { Thread.sleep(RandomUtils.nextInt(10)); } catch( Exception e ) {} } }
After each call to performLengthyProcess( )
, the
unitCompleted( )
method on
ProcessEstimator
is invoked. Every 100th call to
unitComplete( )
causes
ProcessEstimator
to update
SimpleRegression
with the number of records
remaining and the amount of time spent so far. After every
1000th call to
performLengthyProcess( )
, a status message is
printed to the console as follows:
Completed: 0 of 10000 Time Spent: 0 sec, Time Remaining: 0 sec Completed: 1000 of 10000 Time Spent: 4 sec, Time Remaining: 42 sec Completed: 2000 of 10000 Time Spent: 9 sec, Time Remaining: 38 sec Completed: 3000 of 10000 Time Spent: 14 sec, Time Remaining: 33 sec Completed: 4000 of 10000 Time Spent: 18 sec, Time Remaining: 28 sec Completed: 5000 of 10000 Time Spent: 24 sec, Time Remaining: 23 sec Completed: 6000 of 10000 Time Spent: 28 sec, Time Remaining: 19 sec Completed: 7000 of 10000 Time Spent: 33 sec, Time Remaining: 14 sec Completed: 8000 of 10000 Time Spent: 38 sec, Time Remaining: 9 sec Completed: 9000 of 10000 Time Spent: 43 sec, Time Remaining: 4 sec Completed 10000 in 47 seconds.
As shown above, the output periodically displays the amount of time
you can expect the program to continue executing. Initially, there is
no data to make a prediction with, so the
ProcessEstimator
returns zero seconds, but, as the
program executes the performLengthyProcess( )
method 10,000 times, a meaningful time remaining is produced.
The previous example used a method that sleeps for a random number of
milliseconds between 1 and 10, and this value is selected using the
RandomUtils
class described in Recipe 8.4. It is easy to predict how long this process
is going to take because, on average, each method call is going to
sleep for five milliseconds. The ProcessEstimator
is inaccurate when the amount of time to process each record takes a
steadily increasing or decreasing amount of time, or if there is a
block of records that takes substantially more or less time to
process. If the amount of time to process each record does not remain
constant, then the relationship between records processed and time
elapsed is not linear. Because the
ProcessEstimator
uses a linear model,
SimpleRegression
, a nonconstant execution time
will produce inaccurate predictions for time remaining. If you are
using the ProcessEstimator
, make sure that it
takes roughly the same amount of time to process each record.
This recipe refers to the StopWatch
class from
Commons Lang. For more information about the
StopWatch
class, see Recipe 1.19.
18.225.57.126