3
Flat Files
Objectives
An outline of a program to manage flat files of records
A simple sorting algorithm
A look ahead to improved methods for sorting and searching
A look ahead to more sophisticated approaches for abstracting computa-
tions to make programs more general and more reliable
A binary search algorithm, the first “mature” algorithm covered in this text
Key Terms
abstract class
asymptotic analysis
binary search
bubblesort
discipline of
programming
driver program
dynamic data structures
flat file
generics
index
optimal
portable code
probe
pseudocode
recursive divide and con-
quer
semantics
syntax
Introduction
One of our guiding principles is that students at this stage in a computer science
curriculum have probably learned how to do basic computations and now need to
learn more sophisticated methods for writing programs and producing computa-
tional results. We start in this chapter, then, with a summary of what might be
considered to be a reasonably full and complete implementation of an application
but an implementation that uses only the techniques usually covered in a first
course in computing. After presenting this “naive” version of a program, we will
then discuss the ways in which it would be improved upon through the methods
and analysis developed later in this text.
31
32 CHAPTER 3FLAT FILES
3.1 A Basic Indexed File
Let us now look at an example of the naive implementation of a flat file with an
index, complete with a method for looking up an entry in the file. But before we do
that, we make a disclaimer of sorts by directing the reader to Appendix A, where
a number of the author’s idiosyncrasies of coding are presented. We emphasize
that none of these idiosyncrasies should be taken as absolute dogma. Style is a
religious issue, and there are many competing religions. Anyone taking this course
and writing programs should, however, adopt one of the standard styles and apply
it consistently. The more predictable and routine the presentation of code is to a
reader, the less likely that tricky points will be overlooked and silly errors made.
Before embarking on the writing of a program, we should first have an idea of
what it is that we want our program to do. In the case of this example, the program
is an address book. We could implement this with a spreadsheet, and indeed a
spreadsheet and a flat file are very similar. We will want to add records to and delete
records from our file. We will want to be able to do lookups into the file, which means
we will need a “find” capability. Usually we will keep something like a phone book
in sorted order by last name, so we will need a way to sort the records in the file.
Although a simple flat file phone book may appear to be an example too simple
to be of real value, most of the issues involved in the use of mature algorithms
come up once we think of our flat file growing in size from a few hundred records
to tens of millions of records. Adding and deleting records in a large file, searching
for records, and keeping the file in sorted order all become tasks that can no longer
be done in a naive way or else the processing time will become a problem for the
user of the software.
3.1.1 A Basic Program
We take the view in this course that a program generally consists of four basic
parts.
1. The Data Payload
As in the example from Chapter 2, the lowest level in the hierarchy will be a set
of “payload” records that contain the actual data. The class for the payload
will usually be named Record,orPayload, or occasionally something more
meaningful. All the instance variables and methods for managing the actual
data will reside in this class. The goal in this text is to learn to manage data.
For the most part, the means for managing the data are independent of the data
being managed, so most of the higher-level constructs will work with virtually
any payload class. Indeed, that’s the point of good programming style and part
of the justification for object-oriented languages like Java. The code that does
3.1 A Basic Indexed File 33
the sorting of records, for example, really doesn’t need to know what is in those
records.
2. The Data Structure Class
At the next level up, the “data structure” class will contain the instance vari-
ables and methods to manage the data structure that is the particular object
of study. In the case of our example program, the data structure begins not
as a separate class but simply as an ArrayList, and the data structure in
this example is embedded in the code of the next higher “application” level.
This is not an unusual style for many simple applications, and indeed not a
bad program development technique either. If the ArrayList in the previous
chapter’s code turns out to be too inefficient and an improvement to the code
is necessary, then having a working application is a great benefit. It allows the
programmer to concentrate on just the parts of the code being improved, confi-
dent that the rest of the code that drives and tests the application has already
been found correct.
3. The User Application
Next higher in the hierarchy is the “application” class that will contain the
code implementing the features of the desired application. In the case of
our example flat file with an index, the application is that of a phone book.
Typical user actions with a phone book are to add records, to delete records,
to update records, and to search for records (usually by name). The need-to-
know principle of information encapsulation and hiding applies here just as it
does everywhere else. The phone book application needs to know that there
is a method to add records, but it doesn’t really need to know the details
of what happens when a record is added. The application layer thus usually
creates an instance of the data structure and then invokes its methods.
4. The Main Program/Driver Program
Finally, we will always use a driver program, which will contain the Main
class. The driver will normally do three things:
The driver will open files for input of data and will open files for output or
otherwise establish that an output mechanism exists.
In the Main class of the driver program, we will create an instance of an
“application” class to perform some useful task. The application class will in
turn create and use the data structure under discussion. After creating the
application instance and arranging for input and output, the driver program
will then exercise the data structure to verify that the other two parts of the
program (at least two classes but sometimes more) are working correctly.
The driver will then clean up after the test and terminate the program
naturally.
34 CHAPTER 3FLAT FILES
In the case of our first main example, the application is that of an address book or
phone book of directory information. In this case, the payload for the data structure
is an individual Record item containing the last name, office number, phone number,
and current teaching assignment of a list of faculty members. These four fields of a
Record are the instance variables of the Record, and their declaration together with
their accessors and mutators should normally be done inside the Record class. We
can thus view a single directory item about a single individual as an abstract data
type and encapsulate all the management of the information for that class inside
the Record class. This is the simplest form of data encapsulation: Beyond the code
inside the Record class itself, there is usually no need for any program to deal with
the payload record in any way other than as a single entity. Exceptions would exist
if one were to sort on multiple fields; if there is only one “correct sorted order” on
the data, then programs outside the class do not necessarily need to know what that
order is, but if one wants access by last name and by office number, then programs
outside the Record class would, of course, have to know that these were data fields
and that one could access the data in those orders. Generally, though, since there is
no “need to know” beyond the confines of the Record class, we encapsulate inside
the Record class all the code that handles its variables. Figure 3.1 shows the UML
diagram for the Record class; this differs from the Record of the previous chapter
only in that more instance variables exist and that some comparison methods have
been added to allow us to sort the list.
The middle two levels of our application are a data structure of Record data.
In our initial version, that structure is simply an array of Record type and we
combine the data structure (the array) with the user interface and methods (add
a record, etc.). By analogy with a simple spreadsheet, the spreadsheet itself is the
data structure, each row of that data structure is a unique instance of a Record,
and each column of the spreadsheet is a field within the Record.
In general, you should always consider the code hierarchy with an eye to writing
portable code. The phone book application, for example, is really just a program
for handling records that look like a spreadsheet. Very little should need to be
done if those records changed from being phone book records and became retail
transactions instead, perhaps sorted now by date and not by last name. It should
be possible, then, to swap out the payload Record class for a retail transaction
record class, with different variables, and to have all the intermediate code need no
changes. The only change that would be necessary is that the search in the main
method would have to search on some different variable. This is part of the reason
for our restricted notion of a Main class. The Main class should be used for driving
and testing the code below it. If the data payload is changed out, then the driver
code may need to change in order to conduct a slightly different sort of test, but
the application code should need little or no change.
3.1 A Basic Indexed File 35
3.1.2 The Record Code
Figure 3.1 shows the UML diagram for the Record class.
Record
-DUMMYSTRING:String
-DUMMYINT:int
-name:String
-office:String
-phone:String
-teaching:int
+Record():void
+getName():String
+getOffice():String
+getPhone():String
+getTeaching():int
-setName(String):void
-setOffice(String):void
-setPhone(String):void
-setTeaching(int):void
+compareName(String):boolean
+compareTo(Record):int
+readRecord(Scanner):Record
+toString():String
FIGURE 3.1 The UML diagram for the Record class.
The Record class handles everything connected with a single record and the
individual fields inside that record, and the code for this class differs little from
the code in Chapter 2. We have additional instance variables and therefore addi-
tional accessors and mutators, and the readRecord and toString methods must be
different to include the new variables. The real difference, though, is in the compar-
ison methods; the compareTo and compareName methods are new. Code for these
methods is shown in Figure 3.2.
Comparison of numbers and comparisons of variables of primitive types
(boolean, int, double)
1
are usually done with the standard arithmetic symbols
<, , >, , and = (and the computer version ==). Java comparison of the values
1
There are other primitive types, but we will use them only sparingly in this book. The float
data type, for example, is largely unnecessary these days. The only good reason for using a float
variable in the past was to save memory space (4 bytes instead of 8). Memory has become cheap,
however, so there is little practical reason for not doing more accurate computations, and the
standard for real scientific computing has been double variables for some years now. There are,
however, some classes of Java (the Color classisone)thatusefloat and not double values, so a
float value must be used when required.
36 CHAPTER 3FLAT FILES
/**************************************************************
* Method to compare the <code>name</code> values of records.
* This does a comparison on the last name stored against the
* string passed in, returning true or false.
* @param thatName the string to compare last name against
* @return true or false for the comparison
**/
public boolean compareName(String thatName)
{
return this.getName().equals(thatName);
}
/**************************************************************
* Method to compare the <code>name</code> values of records.
* This does a lexicographic comparison on the data stored, so
* if the data stored is not in last-name order we will get
* something different from what might be expected.
* @param that the "that" record to compare to
* @return -1, 0, or +1 depending on how the comparison goes.
**/
public int compareTo(Record that)
{
return this.getName().compareTo(that.getName());
}
FIGURE 3.2
The comparison methods in the Record class.
stored in String variables, however, is done with a compareTo method that returns
1, 0, or +1 depending on whether the “first” string is less than, equal to, or
greater than the “second” string lexicographically. We will maintain this construct
for comparisons, as well as the asymmetric syntax:
thisObject.compareTo(thatObject);
By implementing one or more compareTo methods in a class, we can sort the class
on different fields. For an address book, last name would be one obvious choice, but
one might also want to sort on the course number in this file. The fact that the course
number teaching is an instance of int data means that a different comparison
would need to be made. As we will see later, the various generic, abstract,and
overriding or overloading features of Java are designed specifically to make this
kind of thing easier to do. What we really want to do is to make it easy to perform
comparisons—regardless of data type—on any of the fields in the record, without
exposing to a calling program any information or details that the caller doesn’t
need to know. In a later example, we will create a compareTo method that could
be written and rewritten to make any desired comparisons of the Record data.
3.1 A Basic Indexed File 37
In addition to the compareTo method that we use in the bubblesort routine (to
be discussed shortly), we also have a compareName method; when passed a String
value as a parameter, this method will compare the parameter against the last name
and return either the first record that matches that last name or a null value.
3.1.3 A Driver for the Application
At the top level of our application is the driver code, presented in Figures 3.3–3.5
asthecodefortheMain class. Nothing very special happens here. We open input
and output files, create the FlatFile object, and print out its contents to show
import java.io.PrintWriter;
import java.util.Scanner;
/**************************************************************
* Main class to drive the <code>FlatFile</code> application.
* This program opens an input <code>Scanner</code> file and an
* output <code>PrintWriter</code>, creates an instance of a
* <code>FlatFile</code> structure, and then writes out the
* <code>FlatFile</code> to show that it is indeed empty.
* The program then reads the input into the structure,
* writes out the structure as read, sorts the structure,
* and writes out the sorted structure.
* Two searches are then done, and the program terminates.
* @author Duncan A. Buell
* @version 1.00 2010-07-06
* Copyright (C) 2010 by Duncan A. Buell. All rights reserved.
**/
public class Main
{
public static void main (String[] args)
{
final String TAG = "Main: ";
String inFileName = null;
String outFileName = null;
String searchString = "";
Scanner inFile = null;
PrintWriter outFile = null;
FlatFile theFlatFile = null;
Record foundRecord = null;
inFileName = "myinputfile";
outFileName = "myoutputfile";
inFile = FileUtils.ScannerOpen(inFileName);
outFile = FileUtils.PrintWriterOpen(outFileName);
FIGURE 3.3
Part 1 of the driver Main class for the phone book example.
38 CHAPTER 3FLAT FILES
//*********************************************************
// Create the flat file and read and write it.
outFile.printf("%s create and write out the empty file%n",
TAG);
theFlatFile = new FlatFile();
outFile.printf("%s%n", theFlatFile.toString());
outFile.printf("%s empty file was created%n", TAG);
outFile.printf("%s read the data%n", TAG);
theFlatFile.readFile(inFile);
outFile.printf("%s the data has been read%n", TAG);
FileUtils.CloseFile(inFile);
outFile.printf("%s write out the file%n%s%n",
TAG, theFlatFile.toString());
outFile.printf("%s done with the write%n", TAG);
//*********************************************************
// Finally, sort the file and write out the sorted file.
outFile.printf("%s sort the file%n", TAG);
theFlatFile.sortFile();
outFile.printf("%s after sorting, write the file%n%s%n",
TAG, theFlatFile.toString());
outFile.printf("%s done with the write%n", TAG);
FIGURE 3.4
Part 2 of the driver Main class for the phone book example.
that the object is in fact empty (and that our code handles the case of an empty
object). After reading in the data, we echo the data to make sure that our read has
been successful.
Because data is generally more useful when it is sorted, we call a sort routine to
sort the ArrayList of data.
We then perform two searches, one on the name “Smith” that we know is the
name of someone in the data file, and one on the name “Buell” that we know is not
in the data file. (We are ignoring the issue of multiple records with the same last
name as key; that is, we assume that we have no duplicate keys.)
3.1.4 The Original Flatfile Code
The code at the bottom of the hierarchy for this application is the Record class
that handles the data payload of our flat file. At the top of the hierarchy is the
driver program. As will be typical in this book, the data structure, that is, the
flat file itself, is managed by the code in the intermediate FlatFile class, whose
UML diagram is given in Figure 3.6 and whose code is shown in Figures 3.7–3.12.
3.1 A Basic Indexed File 39
//*********************************************************
// Find a particular item and print the record if found.
searchString = new String("Buell");
outFile.printf("%s find the data item '%s'%n",
TAG, searchString);
foundRecord = theFlatFile.findRecord(searchString);
if(foundRecord != null)
{
outFile.printf("%s found record '%s'%n",TAG,
foundRecord.toString());
}
else
{
outFile.printf("%s record %s not found%n",TAG,
searchString);
}
outFile.printf("%s done with first search%n",TAG);
searchString = new String("Smith");
outFile.printf("%s find the data item '%s'%n",
TAG, searchString);
foundRecord = theFlatFile.findRecord(searchString);
if(foundRecord != null)
{
outFile.printf("%s found record '%s'%n",TAG,
foundRecord.toString());
}
else
{
outFile.printf("%s record %s not found%n",TAG,
searchString);
}
outFile.printf("%s done with second search%n",TAG);
//*********************************************************
// We’re done. Close up and go home.
outFile.printf("%s done with main%n", TAG);
FileUtils.CloseFile(outFile);
}
}
// public class Main
FIGURE 3.5
Part 3 of the driver Main class for the phone book example.
40 CHAPTER 3FLAT FILES
FlatFile
-index:ArrayList<Integer>
-recs:ArrayList<Record>
+FlatFile():void
+getRecord(int):Record
+getSize():int
+findRecord(String):Record
+readFile(Scanner):void
+sortFile():void
+toString():String
FIGURE 3.6 The UML diagram for the FlatFile class.
import java.util.ArrayList;
import java.util.Scanner;
/**************************************************************
* Class for handling activity at the level of the flat file.
* This class has an <code>ArrayList</code> of
* <code>Record</code> data and an index list for sorting,
* methods to access, the two <code>ArrayList</code>s,
* methods to read and to <code>toString</code>, and a
* simple bubblesort method for sorting the list.
* @author Duncan Buell
* @version 1.00 2010-07-05
* Copyright(c) 2010 Duncan A. Buell. All rights reserved.
**/
public class FlatFile
{
/**************************************************************
* Instance variables.
**/
static final String TAG = "FlatFile: ";
private ArrayList<Integer> index;
private ArrayList<Record> recs; // the records to be stored
/**************************************************************
* Constructor.
* Initialize the <code>ArrayList</code> variables.
**/
public FlatFile()
{
this.recs = new ArrayList<Record>();
this.index = new ArrayList<Integer>();
}
FIGURE 3.7
Part 1 of the FlatFile class for the phone book example.
3.1 A Basic Indexed File 41
/**************************************************************
* Method to get an individual <code>Record</code>.
* @param which the subscript of the <code>Record</code> to return
* @return the <code>Record</code>
**/
public Record getRecord(int which)
{
return this.recs.get(this.index.get(which));
}
/**************************************************************
* Method to get the size of the <code>ArrayList</code> of
* data. We are going to call this an accessor, although it
* is actually generating derived data. One of its other
* purposes is to make sure that the size of the data list
* and the size of the index list are the same.
* @return the size of the arrays
**/
public int getSize()
{
if(this.recs.size() != this.index.size())
{
System.out.println("ERROR: unequal list sizes " +
this.recs.size() + ", " +
this.index.size());
System.exit(0);
}
return this.recs.size();
} // public int getSize()
FIGURE 3.8
Part 2 of the FlatFile class for the phone book example.
We store the Record data and its index using ArrayLists. It would be reasonable
in a more complicated program to have a separate class for the different index
arrays, but in this simple example we have included the index as an array inside
the FlatFile class. Note that as we read the data in we set the initial value of the
index to the physical record number in the file, so that all our references to the
array can be made not as recs.get(i) but as recs.get(index.get(i)).
The accessor and mutator for the Record array are not bulletproofed to pre-
vent errors if an invalid subscript is provided to the methods (we will take care
of this in a moment). Similar to what we did in Chapter 2, we include methods
42 CHAPTER 3FLAT FILES
/**************************************************************
* Method to find a record in the flat file.
*
* This is a linear search.
* @param name the last name to look up the record of
* @return the found record, or null if not found.
**/
public Record findRecord(String name)
{
Record returnValue = null;
for(inti=0;i<this.getSize(); ++i)
{
if(this.recs.get(index.get(i)).compareName(name))
{
returnValue = this.recs.get(index.get(i));
}
}
return returnValue;
} // public Record findRecord(String item)
FIGURE 3.9
Part 3 of the FlatFile class for the phone book example.
/**************************************************************
* Method to read the file from an input <code>Scanner</code>
* file. This reads the entire file. As we read and store
* the records, we also store the subscripts in the index.
* @param inFile the <code>Scanner</code> from which to read
**/
public void readFile(Scanner inFile)
{
int sub;
Record rec = null;
sub=0;
while(inFile.hasNext())
{
rec = new Record();
recs.add(rec.readRecord(inFile));
index.add(sub);
++sub;
}
}
FIGURE 3.10
Part 4 of the FlatFile class for the phone book example.
3.1 A Basic Indexed File 43
/**************************************************************
* Method to sort a file.
* This is, embarrassingly enough, simply a bubblesort,
* but it works and is useful for demonstration purposes.
* Note that we sort keys and adjust the index but do not
* actually move records around in the list.
**/
public void sortFile()
{
int tempIndex;
for(int length = this.getSize(); length > 1; --length)
{
for(inti=0;i<length-1; ++i)
{
if(this.recs.get(this.index.get(i)).
compareTo(this.recs.get(this.index.get(i+1))) > 0)
{
tempIndex = this.index.get(i);
this.index.set(i, this.index.get(i+1));
this.index.set(i+1, tempIndex);
}
}
}
}
FIGURE 3.11
Part 5 of the FlatFile class for the phone book example.
/**************************************************************
* Usual <code>toString</code> method.
* @return the <code>String</code> value of the file.
**/
public String toString()
{
Strings="";
for(inti=0;i<this.getSize(); ++i)
{
s += String.format("%s(idx,rec) (%d,%d) %s%n",
TAG, i, this.index.get(i),
recs.get(this.index.get(i)).toString());
}
return s;
}
} // public class FlatFile
FIGURE 3.12
Part 6 of the FlatFile class for the phone book example.
44 CHAPTER 3FLAT FILES
to read an entire input file and to convert the entire array into a String for
printing. The former method extends (and invokes) the readRecord method, and
the latter method extends and invokes the toString method of the underlying
Record class.
We reiterate what was said in Chapter 2, that the toString method allows for
the data of the Record to be formatted and returned exactly in the format declared
to be official by The Powers That Be without any details of the inner workings of
the Record class being exposed to the user of the method.
Because we intend to call a sort method from the driver program, we include a
sortFile method that will sort the data. (We will discuss sorting and the details of
this code in a moment.) In this case, we don’t sort by actually moving the records
around (although we could). Instead, we sort by setting up an index ArrayList;the
integer at subscript i in the index is the subscript of the ith record in the actual
ArrayList of data.
Finally, we have in our driver program a call to a search method to look up
an entry by last name, so we need the findRecord method to implement this
functionality.
3.2 An Analysis of the Code
Using the input data file myinputdata presented in Figure 3.13, we get the output
shown in Figure 3.14. We have left in some of what would normally be considered
testing information for demonstration purposes. All of this is well and good, but
clearly this is only the first step in producing an address book program that someone
would want to use.
Herbert 2A41 789.0123 390
Lander 2A47 789.7890 146
Winthrop 3A71 789.4667 611
Marion 2A13 789.1536 750
Adams 3A56 789.8702 522
Smith 2A46 789.5275 101
Axolotl 3A22 789.2749 102
Zokni 3A39 789.9111 350
Igen 2A21 789.6050 240
Nem 2A89 789.3383 790
FIGURE 3.13
Input data for the indexed flat file example.
3.2 An Analysis of the Code 45
Main: create and write out the empty file
Main: empty file was created
Main: read the data
Main: the data has been read
Main: write out the file
FlatFile: (idx,rec) (0,0) Herbert 2A41 789.0123 390
FlatFile: (idx,rec) (1,1) Lander 2A47 789.7890 146
FlatFile: (idx,rec) (2,2) Winthrop 3A71 789.4667 611
FlatFile: (idx,rec) (3,3) Marion 2A13 789.1536 750
FlatFile: (idx,rec) (4,4) Adams 3A56 789.8702 522
FlatFile: (idx,rec) (5,5) Smith 2A46 789.5275 101
FlatFile: (idx,rec) (6,6) Axolotl 3A22 789.2749 102
FlatFile: (idx,rec) (7,7) Zokni 3A39 789.9111 350
FlatFile: (idx,rec) (8,8) Igen 2A21 789.6050 240
FlatFile: (idx,rec) (9,9) Nem 2A89 789.3383 790
Main: done with the write
Main: sort the file
Main: after sorting, write the file
FlatFile: (idx,rec) (0,4) Adams 3A56 789.8702 522
FlatFile: (idx,rec) (1,6) Axolotl 3A22 789.2749 102
FlatFile: (idx,rec) (2,0) Herbert 2A41 789.0123 390
FlatFile: (idx,rec) (3,8) Igen 2A21 789.6050 240
FlatFile: (idx,rec) (4,1) Lander 2A47 789.7890 146
FlatFile: (idx,rec) (5,3) Marion 2A13 789.1536 750
FlatFile: (idx,rec) (6,9) Nem 2A89 789.3383 790
FlatFile: (idx,rec) (7,5) Smith 2A46 789.5275 101
FlatFile: (idx,rec) (8,2) Winthrop 3A71 789.4667 611
FlatFile: (idx,rec) (9,7) Zokni 3A39 789.9111 350
Main: done with the write
Main: find the data item 'Buell'
Main: record Buell not found
Main: done with first search
Main: find the data item 'Smith'
Main: found record 'Smith 2A46 789.5275 101'
Main: done with second search
Main: done with main
FIGURE 3.14
Output data for the indexed flat file example.
3.2.1 Sorted Data?
First off, we notice that the records are not presented to the program from the
input file in sorted order. This is normal; life doesn’t always present things to us in
the order in which we would like them to happen.
46 CHAPTER 3FLAT FILES
We have two choices when presented with unsorted data. One is that we can read
the data and physically move the records into their proper order.
2
This is what is
normally done on library shelves; the books are physically kept in a defined order
based, for most university libraries, on the Library of Congress indexing system.
This physical sorting of data is acceptable only if the following three conditions
are met:
1. We don’t have a lot of data that would have to be moved.
2. The records are to be accessed in only one order (last name, in this case).
3. The data will be fairly static, because we don’t want to have to keep moving
things around all the time.
The problem of sorting data by moving data around and then keeping the data
physically in sorted order is increased when one takes into consideration the problem
of deleting records. In a flat file image such as we have created in memory so
far, a record that is deleted in the middle presents a problem. The naive way
to fill the empty space left by the deleted record would be to shift the records
up one space as needed. If the deleted record is the last record in the physical
order, there is actually nothing to do (except adjust the size counter), but if the
deleted record is the first record in the physical order, its deletion requires the entire
array to be shifted up. That’s a lot of work with no benefit except to take out a
blank space.
The three conditions that make it reasonable to sort by moving data are rarely
met in practice, so the customary way to deal with this kind of data—and your
second option—is to use an index.Wewillreadthedataandstoreitintoanarray
in the physical order in which it presents itself, but then we will invert on last name
as a key to get an index. The ith index value will be the subscript of the record that
should appear in position i in sorted order. If we were to sort and produce an index,
we would get the second column of Figure 3.15; this would be produced by code
like the following. The important thing to notice here is the indirect reference;we
don’t access the record at subscript i but rather the record at subscript index[i].
for(int i = 0; i < size(); ++i)
{
System.out.printf("%d %d %s%n",
i, index[i], record.getName(index[i]);
}
2
In the pre-Cambrian era of microcomputers, the author had an outside consulting job that
required sorting an address book on an early “personal” computer. The sort implemented by the
vendor for use on 1979-era 5
1
4
-inch floppy disks was horrendously inefficient. Putting a full disk
of records even once into sorted order required so much reading and writing to and from the disk
that the disk became scratched to the point that it was no longer readable.
3.2 An Analysis of the Code 47
Sorted Physical Last Name
Order Order
Subscript Subscript
0 4 Adams
1 6 Axolotl
2 0 Herbert
3 8 Igen
4 1 Lander
5 3 Marion
6 9 Nem
7 5 Smith
8 2 Winthrop
9 7 Zokni
FIGURE 3.15 Indexed file.
3.2.2 A Simple but Inefficient Sorting Algorithm
Our data is not presented in sorted order, so we have implemented a simple bub-
blesort
3
whose code is shown in the FlatFile code.
The index array is implemented as an instance variable
private ArrayList<Integer> index
whose values run in parallel with the recs of Record data; the value of
index.get(0) is the subscript of the Record with the alphabetically first name,
which is subscript 4 in Figure 3.15; the value of index.get(1) is the subscript of
the Record with the alphabetically next name, which is subscript 6 in Figure 3.15;
and so on. When we read the record with subscript i, we initialize index.get(i)
to have the value i. When we walk through the array in the toString method, the
code becomes:
for(inti=0;i<this.getSize(); ++i)
{
s += String.format("%n(%3d,%3d) %s",
i, this.getIndex(i),
recs.get(this.getIndex(i)).toString());
}
3
With small apologies to the English teachers of the world, we will write the names of the
various sorting algorithms without a space before the word “sort.”
48 CHAPTER 3FLAT FILES
Note also that we have implemented a getSize method that returns the size of
the ArrayList recs. In fact, both the ArrayLists recs and index have their own
size method. But since these values are supposed to always be the same (except
during execution of the two lines of code between adding a new Record to the
ArrayList and then adding a new entry to the index), we have implemented and
used our own method to return the size of the lists and included code in that method
to check each time that the two sizes are the same. This kind of defensive program-
ming can reduce the number of silly errors in programming and make life easier.
Most importantly, we need to implement a sorting routine, the code for which
is shown in the FlatFile program. This bubblesort is, as we shall see later, not a
terribly good sort, but it is simple to explain and it happens to get the job done
for the small data file on which we apply it.
As the following pseudocode shows, a bubblesort consists of a doubly-nested loop.
for(length from arraysize to 1 exclusive stepping backwards)
{
for(i from 0 to length-1 exclusive)
{
if(record(i) > record(i+1))
{
swap record(i) and record(i+1)
}
}
}
Let’s take a look at what this pseudocode does, where we read the “exclusive”
in the first for loop to mean
for(i = arraysize; i > 1; --i)
and similarly in the second loop. In the first iteration of the outer loop, with length
set to the length of the original array, we run the inner loop from i=0through
i = length-2. (We have to be careful to stop one short since we are accessing
both subscript i and subscript i+1.) If any pair of values is out of order (by this we
mean that the key value on which to sort is smaller in record(i+1) than it is in
record(i)), then we exchange record(i+1) and record(i). The effect of this is
that at the end of this first iteration of the outer loop the largest value in the entire
array will have been placed in the last location. When we decrement the outer loop
variable and run the inner loop again, the next-largest value will be placed in the
location for the penultimate subscript. When the two loops finish, the array will be
sorted in increasing order.
3.3 A Foreshadowing of Things to Come 49
In keeping with our desire not to move the data records themselves, we are
sorting keys instead of sorting the data itself. The assumption is that an index is
really just an array of integers, that the index array is likely to be stored in memory,
and thus that moving keys around is efficient, whereas moving data that might be
stored on disk would be inefficient and slow. It is also likely to be the case in a
serious application that we would want to sort and then retrieve on multiple keys,
perhaps Social Security number and phone number in addition to the last name,
and for this reason we would need to have an index of keys for each field because
the actual data array cannot, after all, be stored in sorted order in more than
one way.
3.2.3 Searching Through a List
From the main program we have twice issued a findRecord request to find a com-
plete data record whose key is the String last name value provided to the method as
a parameter. Our code implements a linear search. That is, we simply walk through
the array looking for a record whose name matches the value for compareName.This
is certainly a simple method to write, but like most simple methods its perfor-
mance is not very good. We have included the code to count the number of tests
performed because we wish to illustrate that the performance is relatively poor. If
we are searching in an unordered list for a random value of name, and the value
happens to be in the list, then on average we would expect to have to look halfway
through the list in order to find what we are looking for. If we are searching for a
value that is in fact not in the list, we will have to search all the way through in
order to determine that the value just isn’t in the list.
If we take the time to sort the list, our search can end either when we find what
we are looking for or when the next name in our list is alphabetically greater than
the one we are looking for. With the sorted list, for example, we can quit looking for
the name “Buell” as soon as we encounter “Herbert,” because “Herbert” is greater
than “Buell” and therefore all the names past “Herbert” in the sorted list will also
be greater than “Buell.” On average, therefore, a search for a record will require
looking at half the list whether or not the record is present.
3.3 A Foreshadowing of Things to Come
The indexed flat file example, although it is a simple enough program, provides a
good starting point for mentioning where this course is headed.
A large part of this course is about the storage, management, manipulation, and
retrieval of information in efficient ways. The general rule is that there is probably
50 CHAPTER 3FLAT FILES
a simple data structure or algorithm that could be implemented, and that there
is no free lunch: because the structure or algorithm is simple, it will be inefficient.
Generally, you do indeed get what you pay for in life. A bubblesort is simple and
easy to code. Performing a sequential search through a pile of records is simple, but
even if the records are in sorted order, you will on average have to search through
half the pile to find what you are looking for. Simple and thus inefficient is OK for
small-scale implementations—if one’s purpose really is to manage a list of records
in a personal address book, then the code presented here is not a bad start. But
if the records list gets large (like the Los Angeles phone book) or the records are
retrieved constantly (like merchandise looked up on the Amazon.com website), then
simple and inefficient is unacceptable.
Similarly, the storage of data can be done efficiently or inefficiently, and simplis-
tic methods are usually inefficient. Just as with a shelf of books in a library, the
standard flat file poses problems as records are inserted and deleted. The simplest
method for “deleting” a record would be to leave the record in place but add a
flag that indicates the record has been deleted. After a large number of deletions,
though, a sequential search might waste a lot of time walking past deleted records,
and the total space required would be much larger than is actually needed. Closing
up the holes takes work, and it takes more work to open a hole to insert a new record
into sorted order. In this course, we will study several methods for dynamic data
structures that permit efficient insertion and deletion and that allow for continued
efficient search and retrieval of the valid records in the structure.
The other part of this course is the development of good programming skills and
techniques. Just as good data structures are used for organizing the actual data to
be managed, good programming style is in part an intellectual organization of the
software that manages and accesses the data. A main program, for example, should
deal only with things at the very top level of the program: Get the information
from the user, console, or input file; call the method doTheWork to do the work;
and then tidy up at the end and finish. The main program has no need to know
the implementation details of how the work is accomplished; it only needs to know
that the work will be done and the results will be returned in a standard way. The
FlatFile code thus can be reused to handle an entirely different Record class with
no changes at all.
This information hiding is central to good programming style. The code that
implements the flat file need not know anything about the “records” that it is
handling; it need only know that there are things called records and that there is
a linear list of them. Java, as with other object-oriented programming languages,
enforces at the syntactic level a compliance with the strictures of the program design
“religion” that it supports. You may feel that this is restrictive and prevents you
from exercising your artistic freedom in how programs are designed. That’s entirely
the point; the idea is that if the compiler and the IDE can impose constraints
3.3 A Foreshadowing of Things to Come 51
on what is syntactically correct, then it is more likely that a syntactically correct
program will also be semantically correct.
Looked at another way: We all make mistakes, and in writing programs each of
us is likely to have our own short list of mistakes that we will make repeatedly if we
aren’t careful. Part of the discipline of programming, to borrow a term that Dijkstra
used, is for each of us to recognize what those mistakes are and try consciously to
remember not to make them. Part of the discipline imposed by Java and languages
like Java is that many mistakes that are easy to make and are commonly made are
harder to make in Java because the syntax of the language makes it harder to make
those mistakes.
4
3.3.1 Effective Sorting Techniques
One of the methods in FlatFile implements a bubblesort for the purpose of cre-
ating an index by sorting keys. It is not hard to show that a properly implemented
bubblesort requires a constant (fixed for the given implementation) times N
2
com-
parisons in order to sort an array of N data items. It can be proved (and we will
prove it in this course) that any sort that works by comparing values (as does
bubblesort) requires at least a constant times N log N comparisons, and there are
several sorting algorithms that achieve this lower bound. Since we know from cal-
culus that log N is much smaller than N as N gets large, clearly a bubblesort is
not the most efficient sort for large data arrays; its only virtue is its simplicity. We
show in Figure 3.16 the graph of N
2
against N log N,forN running from 0 through
about 10000. Without even looking at the actual numerical values, it is clear from
the figure that a good sorting algorithm with a running time of N log N “steps”
(we will define later the concept of “step” for sorting algorithms) is faster than
a bubblesort for all but the very smallest of problems, and the superiority of the
better algorithm is increasing.
We shall study some different sorting algorithms. Our interest is in algorithms
that are efficient in terms of both the number of comparisons and the extra storage
space needed. Bubblesort, for example, needs only the one extra storage location to
store the temporary value of a key when the ith and jth items are being swapped.
Mergesort, for example, works in N log N time, but it requires as much as N/2extra
storage locations.
The problem of sorting records is still an active research area, and we will only
scratch the surface of the topic.
4
In George Orwell’s 1984, the government strove to remove from the language all those words
with which “thoughtcrime” could be committed. Once all the words with which thoughtcrime
could be expressed were removed from the language, then thoughtcrime itself could not exist.
Java works much the same way. If the language won’t let you write something that The Powers
That Be have determined to be Bad Form, then it becomes that much harder to write code that
possesses Bad Form.
52 CHAPTER 3FLAT FILES
0
2T 4T 6T 8T 10T
2M
4M
6M
8M
10M
N
2
N lg N
FIGURE 3.16 N
2
graphed against N lg N (T =10
3
, M =10
6
).
3.3.2 Effective Search Techniques and Data Structures
for Maintaining Priority
One of the methods in Flatfile implements a linear search to find an item in a
sorted list. The first of the more sophisticated things we will learn, later in this
chapter, is binary search,arecursive divide-and-conquer technique that is almost
always a win when computing things.
Divide and conquer is a basic theme in computation: To solve a problem, recur-
sively cut the size of the problem in half at each stage. We begin in binary search
with a list of N data items. We cut the problem in half by determining whether the
data item is (or rather, could be, since we don’t know that it is actually present in
the list) in the first half of the data or the second half of the data. We then ask the
same question of the data array that is now only half as big as we started with.
In general, it is not hard to show that binary search in a sorted array of N =2
n
items can be done with at most n tests (called probes), and that no similar search
can be done any faster than that. That is, binary search is optimal. We will see
later, for example, that cutting the array into thirds and doing a recursive search
on the one-third of the data in which a sought-for element might appear will not
go any faster, and it would require two tests at every stage, rather than one, to
determine which third might hold the item for which we were searching.
3.3 A Foreshadowing of Things to Come 53
3.3.3 Static Versus Dynamic Storage Structures
One of the main questions to be asked when dealing with a computational problem
is whether the data is essentially static or dynamic, that is, whether the data as
a whole changes slowly or rapidly. A transcript file at a university, for example, is
essentially a static collection of data. Although there will be additions and dele-
tions and updates of grades, local addresses, and such, the data collection is largely
static. Contrast this with the queue of processes ready to be executed in a com-
puter. In a modern computer, a large number of processes will be active at any
point in time, listening for mouse movements, clicks, and keystrokes; managing the
display of multiple windows that might even have ongoing updates or audio or
video playing; as well as the processes that are more realistically viewed as hav-
ing been initiated by the user. The goal of the scheduler in the OS is to keep
these processes sorted by priority (we don’t want any hiccups in the Internet radio
that plays in the background) and to cycle through those ready processes quickly
and efficiently.
A similar task awaits the computer whose job is to collect Internet packets and
assemble them into messages for delivery. Packets are discrete data items, each
one being part of an entire Internet transmission. They may take different paths
from source to destination, they may arrive out of order, and in a large operation
they will be coming in at a tremendous rate. It is the job of some computer in
the flow of traffic to reassemble these packets into their original messages, with the
data payloads in order, and to be able to handle simultaneously the assembly of
packets for multiple messages at a time. This is a classic dynamic data processing
problem: A payload of unknown size is broken into packets; the packets must be
sorted as they come in; and when the message is fully reassembled, we can send on
the complete message and delete that data from storage.
Handling these two different kinds of problems usually takes different kinds of
data structures as well as an assessment of the overall characteristics of the prob-
lem. Having all the data before we begin can be a very good thing; being forced to
organize data before we have any idea what the data looks like can lead to patho-
logically bad performance. We have seen the value of having data in sorted order.
If we have data that undergoes updates only once a week, for example, and it takes
an hour to sort the data from scratch, then it is perfectly reasonable to consider
that hour an investment worth making. If the data is updated once an hour, and
it takes an hour to sort, then the time spent sorting is not an investment but an
ongoing expense.
3.3.4 Generic and Abstract Classes
Java has a powerful capability in its generic construct and abstract classes that allow
common methods to handle different data types. We have already seen with the
54 CHAPTER 3FLAT FILES
compareTo method the power of having a standard syntax for a common operation:
we implement comparison of objects as
thisObject.compareTo(thatObject)
with a returned result that is a negative number if this is less than that,apositive
number if this is greater than that, and a zero if they are equal.
You have probably already used, without necessarily being made aware of them,
some of this abstraction capability of Java. The Java generics and abstract classes
permit the programmer to define templates and general-purpose methods that act
on objects whose data types are not specified at the time the code is written. This
provides another vehicle for separating the data structure that supports efficient
computation from the underlying data on which the computation is performed.
3.3.5 Arrays Versus ArrayLists
The experienced Java programmer will recognize that we have implemented the
FlatFile program using an ArrayList instead of an array. This is because for
one-dimensional lists of things, whose size might not be known at compile time,
the ArrayList is a much simpler construct to use than an array. Storage for an
ArrayList is done dynamically, with no preallocation needed and no out-of-space
errors from allocating too little space for the input data. The size method allows
one to determine the actual number of items stored and decreases the chance of
writing code that steps out of bounds. The deletion of items (which we didn’t
happen to do here) is done with a remove method that does the list compression
to remove the hole caused by deletion.
AJavaArrayList also has a contains method that does the lookup that we
implemented with the linear search method, and that should in fact implement a
binary search as described in the next section.
We note, however, that there is no inherent sorting function for an ArrayList and
thus we would still need to sort the records and create an index if we wanted to deal
with the ArrayList in sorted order. There are other Java collection classes that do
implement sort order capabilities. Just as with the binary search for the contains
method, we would have to assume that the implementors of the sort methods had
used an efficient algorithm. Indeed, it is fair to say that all the algorithms and tech-
niques presented in this course have already been implemented in Java in some class.
3.4 Binary Search: Our First Mature Algorithm
The first “mature” algorithm we will consider, as a prelude to the rest of the text,
is a binary search. We are going to introduce this now, although this may seem to
be a bit off-topic at the moment, because we want to contrast the basic nature of
binary search against a more naive linear search.
3.4 Binary Search: Our First Mature Algorithm 55
In our earlier search method, we did a linear search, entry by entry, through the
ArrayList until the desired record was found. For a randomly chosen record for
which to search, we would expect to have to search on average halfway through
a list of N entries. It is equally likely that we would have to search through to
location k as it is that we would have to search to location N k. Since these two
searches together require k +(N k)=N “probes,” they each require on average
N/2 probes into the list.
A binary search proceeds by a recursive divide-and-conquer approach to search-
ing for an element. If we have paid the initial price of sorting the data, then binary
search is the most natural algorithm to use for searching because it is relatively
simple (although not totally trivial) and because it can be shown to be the fastest
search possible.
In a recursive divide-and-conquer search, we start with a list of N entries in a
list L, which for the moment we can assume to be simply a list of sorted integers,
with no repeated values. We want to know if a given integer k is one of the elements
in the list, and if it is, what its subscript value is.
Our first probe is to compare k against the entry stored halfway into the list at
location L[N/2], whose value we will call m.Ifk<m, and the list L is assumed
to be sorted, then k, if it exists in the list, will be found in the first half of the
list because all the entries in the second half of the list are larger than m and thus
larger than k.Ifk>m, and the list L is assumed to be sorted, then k,ifitexists
in the list, will be found in the second half of the list because all the entries in the
first half of the list are smaller than m and thus smaller than k.
With one probe, then, we have reduced the problem of searching a list of N
entries to the problem of searching a list of N/2 entries. We now do a second probe
to cut the list from size N/2 in half again to size N/4. And again. And again.
Consider the example in Figure 3.17. Assume we have a list of 16 names in
sorted order (history buffs will note that we skipped one in order to have unique
surnames): Let’s say we do a lookup on “Tyler.” We compare Tyler against the
Loc. Name Loc. Name
0 Adams 8 Madison
1 Buchanan 9 Monroe
2 Fillmore 10 Pierce
3 Harrison 11 Polk
4 Jackson 12 Taylor
5 Jefferson 13 Tyler
6 Johnson 14 Van Buren
7 Lincoln 15 Washington
FIGURE 3.17 A sorted list for a binary search.
56 CHAPTER 3FLAT FILES
name at subscript 8, which is “Madison” (we are indexing zero-up, just like Java,
and we arbitrarily choose always to round up to the first subscript in the upper
half of any list or sublist if we have to break ties). Tyler is larger than Madison, so
we know we have a record in the second half of the list (if it’s there at all). Since
we have retrieved the record at subscript 8, we test equality as well as greater-
equal/less-equal, and find that we don’t have equality. The second half of the list
has subscripts 8–15, and we know we didn’t get a match at subscript 8, so we are
looking at subscripts 9–15. The midpoint (and this time it’s actually the midpoint)
is 12, and the entry at subscript 12 is “Taylor.” Tyler is still larger than Taylor, so
we continue searching in the short list of subscripts 13–15, since we know it’s not
at subscript 12. Comparing against subscript 14, “Van Buren,” we know we are in
the first half of that sublist. The sublist has length 1, though, so when we compare
against subscript 13, which is “Tyler,” we get a match.
If we look for the name “Clay” (remembering that Henry Clay very much wanted
to become a U.S. president but never managed to do so), we compare against
subscripts 8 (Madison), then 4 (Jackson), then 2 (Fillmore), then 1 (Buchanan).
Since we don’t find a match, we know that Clay is not in the list.
As should be obvious, the difficult part about writing a method for this is getting
right the concept of “halfway.” With an odd number of items in the list, there is a
midpoint. With an even number of items in the list, one has to decide whether to
take the last entry in the first half or the first entry in the second half.
If the list is of N =2
n
entries, then it will take at most n =log
2
N probes to
reduce the problem to looking at a list of one entry, which is merely a single test.
To verify that Clay was not in the list of 2
4
= 16 items, for example, we checked
the four entries at subscripts 8, 4, 2, and 1.
Now we need to clarify a couple of points on which we have been a little sloppy.
First, when we execute a probe, we can test both
if(this.compareTo(that) < 0)
and
if(this.compareTo(that) == 0)
If we find the entry for which we are looking, then of course we have solved the
problem. If not, we continue. Technically this would split the list not into two
halves but into one sublist larger than the probed value, one sublist smaller than
the probed value, and the probed value itself. If we test in this way, we can shorten
our sublist by one by not including the probed value, since we have already tested it.
Similarly, if N is not an exact power of 2, the length in time of the binary search
is really no worse than it would be for the next larger power of 2. If we were to
3.4 Binary Search: Our First Mature Algorithm 57
increase the size of the array to the next larger power of 2 by padding at the end
with values of MAXIMUM
INTEGER (whatever that value is), we will increase the total
cost of the search by at most one extra probe.
The code for a binary search method that could be substituted for the linear
search in our FlatFile example is given in Figure 3.18.
/**************************************************************
* Method to find a record in the flat file.
* This is binary search. Start with lower and upper subscripts.
* Compute the middle subscript. If the value in the middle is
* larger than the item sought, move the upper subscript down.
* If the value in the middle is smaller, move the lower subscript
* up. Then repeat until found or until we meet in the middle.
* @param name the <code>String</code> value to search for.
* @return the subscript of the found record, or -1 if not found.
**/
public Record findRecordBinary(String name)
{
final int NOTFOUND = -1;
int lowerSub, midSub, returnSub, upperSub;
Record returnValue = null;
lowerSub = 0;
upperSub = this.getSize()-1;
midSub = (lowerSub + upperSub)/2;
returnSub = NOTFOUND;
while(lowerSub < upperSub)
{
String thisMidName = this.recs.get(this.index.get(midSub)).getName();
if(0 > thisMidName.compareTo(name))
lowerSub = midSub + 1;
else if(0 < thisMidName.compareTo(name))
upperSub = midSub - 1;
else
{
returnSub = midSub;
break;
}
midSub = (lowerSub + upperSub)/2;
}
if(NOTFOUND != returnSub)
returnValue = this.recs.get(this.index.get(returnSub));
return returnValue;
}
FIGURE 3.18
The binary search code.
58 CHAPTER 3FLAT FILES
The use of a divide-and-conquer approach is so pervasive in computing that a
new mathematical symbol is used to describe the running time of algorithms of
this sort. We know that if N =2
n
, then n =log
2
N,whichwewriteasn =lgN.
In Chapter 6, we will do more formal asymptotic analysis of algorithms, but the
power of binary search over linear search is easily seen from the graph in Figure 3.19,
showing N/2 graphed against lg N. Clearly binary search is superior, and it becomes
better and better the larger the size of the list to be searched. The graph of lg N is
barely noticeable. This is not entirely surprising—2
23
is about eight million, so the
binary log of 10
6
at the right edge of the graph is only a little larger than 23, and
clearly will barely register on a y-axis measured in millions.
0
2M 4M 6M 8M 10M
1M
2M
3M
4M
5M
N/2
lg
N
FIGURE 3.19 N/2 graphed against lg N (M =10
6
).
3.5 Summary
We have covered in this chapter a program for managing, in a naive way, basic
data records in a flat file or spreadsheet-like structure. A large number of computer
applications have essentially this purpose: to maintain a file of records, with the
ability to add records, edit records, delete records, sort records in any of a number
of different ways, and search and retrieve records that match some search field.
For small files of no more than a few thousand records, a simplistic program
such as the one presented here would be more than adequate. For large software
systems, however, all the functions—adding, editing, deleting, sorting, searching,
and retrieving—must be done more efficiently. The purpose of this course and this
text is to introduce data structures for more efficient management of data, together
with algorithms that process data in more efficient ways or on those more effi-
cient structures. We began the process of increasing the sophistication of skills by
introducing binary search.
3.6 Exercises 59
3.6 Exercises
(Sample data for many of these exercises can be found on the website at
go.jblearning.com/Buell.)
1. The current code has the input and output filenames hard-coded. Modify the
driver main method to ask the user for these filenames. Make sure you catch
the exceptions on opening the files, since that is very important for correct
execution of programs.
2. The current FlatFile code has a findRecord method that uses the
compareName method of the Record class. Modify this method to call a more
general-purpose findByKey method, and implement the findByKey method in
the Record class.
3. If you have done the previous exercise, you will notice that you can overload
the findByKey method in the Record class to allow for either String or int
variables as parameters. Write these overloaded methods.
4. If you have done the previous two exercises, you will notice that you still cannot
search both by name and by office, since those two String variables are going
to be indistinguishable to findByKey. This can be fixed. Create a separate
class for the name, making sure to include a nondefault constructor that takes
a String variable as parameter and assigns that to a String instance variable
inside your Name class. Do the same for the office variable. Now you can
create an overloaded findByKey(Name whatName) method that searches on
the name variable and a different findByKey(Office whatOffice) method
that searches on the office variable. Do this, and modify FlatFile to do the
appropriate searches. The key point here is the information hiding: nowhere in
the FlatFile code should the fact that Name and Office are in fact String
values be used. All that is used in FlatFile is that one can create instances
of Name and Office by passing in a String when constructing an instance.
5. Replace the Record class with a class that handles names of CDs and artists
performing on the CDs, as if you were making a database of your own collec-
tion. Except for changing the method calls for the search, the code outside the
Record should not have to change at all.
6. If you have done the previous exercise, you know that your Record class is
insufficient. You really want all the tracks in your record, so add an ArrayList
variable inside Record
that allows you to read the track names and outputs
the track names when toString is called.
7. If you have done the previous two exercises, modify the search capability to
allow searching by track name. Since it is highly unlikely that the number of
60 CHAPTER 3FLAT FILES
tracks per CD is larger than about 20, and since it is highly unlikely that the
tracks are in alphabetically sorted order, you can probably do just fine with a
linear search.
8. You probably want to search your CD list both by artist name and by CD
label name, but you probably used a String variable for both of them. Modify
your CD code following the changes for the findByKey methods in the earlier
exercises to allow you to search either way.
9. Modify your linear search to count the number of tests you have to do. Increase
the size of the input data to 100 items, say, so you have enough data for the
code to chew upon. Then perform several searches. What is the average number
of tests? It should be about 50 if you are searching for items both early in the
list and late in the list. Swap out the linear search for the binary search and do
the experiment again. This time your average count of tests should be closer
to 7, since 100 128 = 2
7
.
10. Modify the sort method so that you count the number of comparisons per-
formed, and output the number of comparisons for the original input data.
You should get (10
2
)/2 = 50. Now double the number of items in the input
data and run the experiment again. This time you should get (20
2
)/2 = 200.
Double the data input again, and check that this time it takes (40
2
)/2 = 800
comparisons (make sure you have unique items, as this can affect the count).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.194.123