Chapter 3 Flat Files

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Flat Files

Objectives

 An outline of a program to manage ﬂat ﬁles of records

 A simple sorting algorithm

 A look ahead to improved methods for sorting and searching

 A look ahead to more sophisticated approaches for abstracting computa-

tions to make programs more general and more reliable

 A binary search algorithm, the ﬁrst “mature” algorithm covered in this text

Key Terms

abstract class

asymptotic analysis

binary search

bubblesort

discipline of

programming

driver program

dynamic data structures

ﬂat ﬁle

generics

index

optimal

portable code

probe

pseudocode

recursive divide and con-

quer

semantics

syntax

Introduction

One of our guiding principles is that students at this stage in a computer science

curriculum have probably learned how to do basic computations and now need to

learn more sophisticated methods for writing programs and producing computa-

tional results. We start in this chapter, then, with a summary of what might be

considered to be a reasonably full and complete implementation of an application

but an implementation that uses only the techniques usually covered in a ﬁrst

course in computing. After presenting this “naive” version of a program, we will

then discuss the ways in which it would be improved upon through the methods

and analysis developed later in this text.

32 CHAPTER 3FLAT FILES

 3.1 A Basic Indexed File

Let us now look at an example of the naive implementation of a flat file with an

index, complete with a method for looking up an entry in the ﬁle. But before we do

that, we make a disclaimer of sorts by directing the reader to Appendix A, where

a number of the author’s idiosyncrasies of coding are presented. We emphasize

that none of these idiosyncrasies should be taken as absolute dogma. Style is a

religious issue, and there are many competing religions. Anyone taking this course

and writing programs should, however, adopt one of the standard styles and apply

it consistently. The more predictable and routine the presentation of code is to a

reader, the less likely that tricky points will be overlooked and silly errors made.

Before embarking on the writing of a program, we should ﬁrst have an idea of

what it is that we want our program to do. In the case of this example, the program

is an address book. We could implement this with a spreadsheet, and indeed a

spreadsheet and a ﬂat ﬁle are very similar. We will want to add records to and delete

records from our ﬁle. We will want to be able to do lookups into the ﬁle, which means

we will need a “ﬁnd” capability. Usually we will keep something like a phone book

in sorted order by last name, so we will need a way to sort the records in the ﬁle.

Although a simple ﬂat ﬁle phone book may appear to be an example too simple

to be of real value, most of the issues involved in the use of mature algorithms

come up once we think of our ﬂat ﬁle growing in size from a few hundred records

to tens of millions of records. Adding and deleting records in a large ﬁle, searching

for records, and keeping the ﬁle in sorted order all become tasks that can no longer

be done in a naive way or else the processing time will become a problem for the

user of the software.

 3.1.1 A Basic Program

We take the view in this course that a program generally consists of four basic

parts.

1. The Data Payload

As in the example from Chapter 2, the lowest level in the hierarchy will be a set

of “payload” records that contain the actual data. The class for the payload

will usually be named Record,orPayload, or occasionally something more

meaningful. All the instance variables and methods for managing the actual

data will reside in this class. The goal in this text is to learn to manage data.

For the most part, the means for managing the data are independent of the data

being managed, so most of the higher-level constructs will work with virtually

any payload class. Indeed, that’s the point of good programming style and part

of the justiﬁcation for object-oriented languages like Java. The code that does

3.1 A Basic Indexed File 33

the sorting of records, for example, really doesn’t need to know what is in those

records.

2. The Data Structure Class

At the next level up, the “data structure” class will contain the instance vari-

ables and methods to manage the data structure that is the particular object

of study. In the case of our example program, the data structure begins not

as a separate class but simply as an ArrayList, and the data structure in

this example is embedded in the code of the next higher “application” level.

This is not an unusual style for many simple applications, and indeed not a

bad program development technique either. If the ArrayList in the previous

chapter’s code turns out to be too ineﬃcient and an improvement to the code

is necessary, then having a working application is a great beneﬁt. It allows the

programmer to concentrate on just the parts of the code being improved, conﬁ-

dent that the rest of the code that drives and tests the application has already

been found correct.

3. The User Application

Next higher in the hierarchy is the “application” class that will contain the

code implementing the features of the desired application. In the case of

our example ﬂat ﬁle with an index, the application is that of a phone book.

Typical user actions with a phone book are to add records, to delete records,

to update records, and to search for records (usually by name). The need-to-

know principle of information encapsulation and hiding applies here just as it

does everywhere else. The phone book application needs to know that there

is a method to add records, but it doesn’t really need to know the details

of what happens when a record is added. The application layer thus usually

creates an instance of the data structure and then invokes its methods.

4. The Main Program/Driver Program

Finally, we will always use a driver program, which will contain the Main

class. The driver will normally do three things:

• The driver will open ﬁles for input of data and will open ﬁles for output or

otherwise establish that an output mechanism exists.

• In the Main class of the driver program, we will create an instance of an

“application” class to perform some useful task. The application class will in

turn create and use the data structure under discussion. After creating the

application instance and arranging for input and output, the driver program

will then exercise the data structure to verify that the other two parts of the

program (at least two classes but sometimes more) are working correctly.

• The driver will then clean up after the test and terminate the program

naturally.

34 CHAPTER 3FLAT FILES

In the case of our ﬁrst main example, the application is that of an address book or

phone book of directory information. In this case, the payload for the data structure

is an individual Record item containing the last name, oﬃce number, phone number,

and current teaching assignment of a list of faculty members. These four ﬁelds of a

Record are the instance variables of the Record, and their declaration together with

their accessors and mutators should normally be done inside the Record class. We

can thus view a single directory item about a single individual as an abstract data

type and encapsulate all the management of the information for that class inside

the Record class. This is the simplest form of data encapsulation: Beyond the code

inside the Record class itself, there is usually no need for any program to deal with

the payload record in any way other than as a single entity. Exceptions would exist

if one were to sort on multiple ﬁelds; if there is only one “correct sorted order” on

the data, then programs outside the class do not necessarily need to know what that

order is, but if one wants access by last name and by oﬃce number, then programs

outside the Record class would, of course, have to know that these were data ﬁelds

and that one could access the data in those orders. Generally, though, since there is

no “need to know” beyond the conﬁnes of the Record class, we encapsulate inside

the Record class all the code that handles its variables. Figure 3.1 shows the UML

diagram for the Record class; this diﬀers from the Record of the previous chapter

only in that more instance variables exist and that some comparison methods have

been added to allow us to sort the list.

The middle two levels of our application are a data structure of Record data.

In our initial version, that structure is simply an array of Record type and we

combine the data structure (the array) with the user interface and methods (add

a record, etc.). By analogy with a simple spreadsheet, the spreadsheet itself is the

data structure, each row of that data structure is a unique instance of a Record,

and each column of the spreadsheet is a ﬁeld within the Record.

In general, you should always consider the code hierarchy with an eye to writing

portable code. The phone book application, for example, is really just a program

for handling records that look like a spreadsheet. Very little should need to be

done if those records changed from being phone book records and became retail

transactions instead, perhaps sorted now by date and not by last name. It should

be possible, then, to swap out the payload Record class for a retail transaction

record class, with diﬀerent variables, and to have all the intermediate code need no

changes. The only change that would be necessary is that the search in the main

method would have to search on some diﬀerent variable. This is part of the reason

for our restricted notion of a Main class. The Main class should be used for driving

and testing the code below it. If the data payload is changed out, then the driver

code may need to change in order to conduct a slightly diﬀerent sort of test, but

the application code should need little or no change.

3.1 A Basic Indexed File 35

 3.1.2 The Record Code

Figure 3.1 shows the UML diagram for the Record class.

Record

-DUMMYSTRING:String

-DUMMYINT:int

-name:String

-office:String

-phone:String

-teaching:int

+Record():void

+getName():String

+getOffice():String

+getPhone():String

+getTeaching():int

-setName(String):void

-setOffice(String):void

-setPhone(String):void

-setTeaching(int):void

+compareName(String):boolean

+compareTo(Record):int

+readRecord(Scanner):Record

+toString():String

FIGURE 3.1  The UML diagram for the Record class.

The Record class handles everything connected with a single record and the

individual ﬁelds inside that record, and the code for this class diﬀers little from

the code in Chapter 2. We have additional instance variables and therefore addi-

tional accessors and mutators, and the readRecord and toString methods must be

diﬀerent to include the new variables. The real diﬀerence, though, is in the compar-

ison methods; the compareTo and compareName methods are new. Code for these

methods is shown in Figure 3.2.

Comparison of numbers and comparisons of variables of primitive types

(boolean, int, double)

are usually done with the standard arithmetic symbols

<, ≤, >, ≥, and = (and the computer version ==). Java comparison of the values

There are other primitive types, but we will use them only sparingly in this book. The float

data type, for example, is largely unnecessary these days. The only good reason for using a float

variable in the past was to save memory space (4 bytes instead of 8). Memory has become cheap,

however, so there is little practical reason for not doing more accurate computations, and the

standard for real scientiﬁc computing has been double variables for some years now. There are,

however, some classes of Java (the Color classisone)thatusefloat and not double values, so a

float value must be used when required.

36 CHAPTER 3FLAT FILES

/**************************************************************

* Method to compare the <code>name</code> values of records.

* This does a comparison on the last name stored against the

* string passed in, returning true or false.

* @param thatName the string to compare last name against

* @return true or false for the comparison

**/

public boolean compareName(String thatName)

{

return this.getName().equals(thatName);

}

/**************************************************************

* Method to compare the <code>name</code> values of records.

* This does a lexicographic comparison on the data stored, so

* if the data stored is not in last-name order we will get

* something different from what might be expected.

* @param that the "that" record to compare to

* @return -1, 0, or +1 depending on how the comparison goes.

**/

public int compareTo(Record that)

{

return this.getName().compareTo(that.getName());

}

FIGURE 3.2

 The comparison methods in the Record class.

stored in String variables, however, is done with a compareTo method that returns

−1, 0, or +1 depending on whether the “ﬁrst” string is less than, equal to, or

greater than the “second” string lexicographically. We will maintain this construct

for comparisons, as well as the asymmetric syntax:

thisObject.compareTo(thatObject);

By implementing one or more compareTo methods in a class, we can sort the class

on diﬀerent ﬁelds. For an address book, last name would be one obvious choice, but

one might also want to sort on the course number in this ﬁle. The fact that the course

number teaching is an instance of int data means that a diﬀerent comparison

would need to be made. As we will see later, the various generic, abstract,and

overriding or overloading features of Java are designed speciﬁcally to make this

kind of thing easier to do. What we really want to do is to make it easy to perform

comparisons—regardless of data type—on any of the ﬁelds in the record, without

exposing to a calling program any information or details that the caller doesn’t

need to know. In a later example, we will create a compareTo method that could

be written and rewritten to make any desired comparisons of the Record data.

3.1 A Basic Indexed File 37

In addition to the compareTo method that we use in the bubblesort routine (to

be discussed shortly), we also have a compareName method; when passed a String

value as a parameter, this method will compare the parameter against the last name

and return either the ﬁrst record that matches that last name or a null value.

 3.1.3 A Driver for the Application

At the top level of our application is the driver code, presented in Figures 3.3–3.5

asthecodefortheMain class. Nothing very special happens here. We open input

and output ﬁles, create the FlatFile object, and print out its contents to show

import java.io.PrintWriter;

import java.util.Scanner;

/**************************************************************

* Main class to drive the <code>FlatFile</code> application.

* This program opens an input <code>Scanner</code> file and an

* output <code>PrintWriter</code>, creates an instance of a

* <code>FlatFile</code> structure, and then writes out the

* <code>FlatFile</code> to show that it is indeed empty.

* The program then reads the input into the structure,

* writes out the structure as read, sorts the structure,

* and writes out the sorted structure.

* Two searches are then done, and the program terminates.

* @author Duncan A. Buell

* @version 1.00 2010-07-06

**/

public class Main

{

public static void main (String[] args)

{

final String TAG = "Main: ";

String inFileName = null;

String outFileName = null;

String searchString = "";

Scanner inFile = null;

PrintWriter outFile = null;

FlatFile theFlatFile = null;

Record foundRecord = null;

inFileName = "myinputfile";

outFileName = "myoutputfile";

inFile = FileUtils.ScannerOpen(inFileName);

outFile = FileUtils.PrintWriterOpen(outFileName);

FIGURE 3.3

 Part 1 of the driver Main class for the phone book example.

38 CHAPTER 3FLAT FILES

//*********************************************************

// Create the flat file and read and write it.

outFile.printf("%s create and write out the empty file%n",

TAG);

theFlatFile = new FlatFile();

outFile.printf("%s%n", theFlatFile.toString());

outFile.printf("%s empty file was created%n", TAG);

outFile.printf("%s read the data%n", TAG);

theFlatFile.readFile(inFile);

outFile.printf("%s the data has been read%n", TAG);

FileUtils.CloseFile(inFile);

outFile.printf("%s write out the file%n%s%n",

TAG, theFlatFile.toString());

outFile.printf("%s done with the write%n", TAG);

//*********************************************************

// Finally, sort the file and write out the sorted file.

outFile.printf("%s sort the file%n", TAG);

theFlatFile.sortFile();

outFile.printf("%s after sorting, write the file%n%s%n",

TAG, theFlatFile.toString());

outFile.printf("%s done with the write%n", TAG);

FIGURE 3.4

 Part 2 of the driver Main class for the phone book example.

that the object is in fact empty (and that our code handles the case of an empty

object). After reading in the data, we echo the data to make sure that our read has

been successful.

Because data is generally more useful when it is sorted, we call a sort routine to

sort the ArrayList of data.

We then perform two searches, one on the name “Smith” that we know is the

name of someone in the data ﬁle, and one on the name “Buell” that we know is not

in the data ﬁle. (We are ignoring the issue of multiple records with the same last

name as key; that is, we assume that we have no duplicate keys.)

 3.1.4 The Original Flatfile Code

The code at the bottom of the hierarchy for this application is the Record class

that handles the data payload of our ﬂat ﬁle. At the top of the hierarchy is the

driver program. As will be typical in this book, the data structure, that is, the

ﬂat ﬁle itself, is managed by the code in the intermediate FlatFile class, whose

UML diagram is given in Figure 3.6 and whose code is shown in Figures 3.7–3.12.

3.1 A Basic Indexed File 39

//*********************************************************

// Find a particular item and print the record if found.

searchString = new String("Buell");

outFile.printf("%s find the data item '%s'%n",

TAG, searchString);

foundRecord = theFlatFile.findRecord(searchString);

if(foundRecord != null)

{

outFile.printf("%s found record '%s'%n",TAG,

foundRecord.toString());

}

else

{

outFile.printf("%s record %s not found%n",TAG,

searchString);

}

outFile.printf("%s done with first search%n",TAG);

searchString = new String("Smith");

outFile.printf("%s find the data item '%s'%n",

TAG, searchString);

foundRecord = theFlatFile.findRecord(searchString);

if(foundRecord != null)

{

outFile.printf("%s found record '%s'%n",TAG,

foundRecord.toString());

}

else

{

outFile.printf("%s record %s not found%n",TAG,

searchString);

}

outFile.printf("%s done with second search%n",TAG);

//*********************************************************

// We’re done. Close up and go home.

outFile.printf("%s done with main%n", TAG);

FileUtils.CloseFile(outFile);

}

// public class Main

FIGURE 3.5

 Part 3 of the driver Main class for the phone book example.

40 CHAPTER 3FLAT FILES

FlatFile

-index:ArrayList<Integer>

-recs:ArrayList<Record>

+FlatFile():void

+getRecord(int):Record

+getSize():int

+ﬁndRecord(String):Record

+readFile(Scanner):void

+sortFile():void

+toString():String

FIGURE 3.6  The UML diagram for the FlatFile class.

import java.util.ArrayList;

import java.util.Scanner;

/**************************************************************

* Class for handling activity at the level of the flat file.

* This class has an <code>ArrayList</code> of

* <code>Record</code> data and an index list for sorting,

* methods to access, the two <code>ArrayList</code>s,

* methods to read and to <code>toString</code>, and a

* simple bubblesort method for sorting the list.

* @author Duncan Buell

* @version 1.00 2010-07-05

**/

public class FlatFile

{

/**************************************************************

* Instance variables.

**/

static final String TAG = "FlatFile: ";

private ArrayList<Integer> index;

private ArrayList<Record> recs; // the records to be stored

/**************************************************************

* Constructor.

* Initialize the <code>ArrayList</code> variables.

**/

public FlatFile()

{

this.recs = new ArrayList<Record>();

this.index = new ArrayList<Integer>();

}

FIGURE 3.7

 Part 1 of the FlatFile class for the phone book example.

3.1 A Basic Indexed File 41

/**************************************************************

* Method to get an individual <code>Record</code>.

* @param which the subscript of the <code>Record</code> to return

* @return the <code>Record</code>

**/

public Record getRecord(int which)

{

return this.recs.get(this.index.get(which));

}

/**************************************************************

* Method to get the size of the <code>ArrayList</code> of

* data. We are going to call this an accessor, although it

* is actually generating derived data. One of its other

* purposes is to make sure that the size of the data list

* and the size of the index list are the same.

* @return the size of the arrays

**/

public int getSize()

{

if(this.recs.size() != this.index.size())

{

System.out.println("ERROR: unequal list sizes " +

this.recs.size() + ", " +

this.index.size());

System.exit(0);

}

return this.recs.size();

} // public int getSize()

FIGURE 3.8

 Part 2 of the FlatFile class for the phone book example.

We store the Record data and its index using ArrayLists. It would be reasonable

in a more complicated program to have a separate class for the diﬀerent index

arrays, but in this simple example we have included the index as an array inside

the FlatFile class. Note that as we read the data in we set the initial value of the

index to the physical record number in the ﬁle, so that all our references to the

array can be made not as recs.get(i) but as recs.get(index.get(i)).

The accessor and mutator for the Record array are not bulletproofed to pre-

vent errors if an invalid subscript is provided to the methods (we will take care

of this in a moment). Similar to what we did in Chapter 2, we include methods

42 CHAPTER 3FLAT FILES

/**************************************************************

* Method to find a record in the flat file.

* This is a linear search.

* @param name the last name to look up the record of

* @return the found record, or null if not found.

**/

public Record findRecord(String name)

{

Record returnValue = null;

for(inti=0;i<this.getSize(); ++i)

{

if(this.recs.get(index.get(i)).compareName(name))

{

returnValue = this.recs.get(index.get(i));

}

return returnValue;

} // public Record findRecord(String item)

FIGURE 3.9

 Part 3 of the FlatFile class for the phone book example.

/**************************************************************

* Method to read the file from an input <code>Scanner</code>

* file. This reads the entire file. As we read and store

* the records, we also store the subscripts in the index.

* @param inFile the <code>Scanner</code> from which to read

**/

public void readFile(Scanner inFile)

{

int sub;

Record rec = null;

sub=0;

while(inFile.hasNext())

{

rec = new Record();

recs.add(rec.readRecord(inFile));

index.add(sub);

++sub;

}

FIGURE 3.10

 Part 4 of the FlatFile class for the phone book example.

3.1 A Basic Indexed File 43

/**************************************************************

* Method to sort a file.

* This is, embarrassingly enough, simply a bubblesort,

* but it works and is useful for demonstration purposes.

* Note that we sort keys and adjust the index but do not

* actually move records around in the list.

**/

public void sortFile()

{

int tempIndex;

for(int length = this.getSize(); length > 1; --length)

{

for(inti=0;i<length-1; ++i)

{

if(this.recs.get(this.index.get(i)).

compareTo(this.recs.get(this.index.get(i+1))) > 0)

{

tempIndex = this.index.get(i);

this.index.set(i, this.index.get(i+1));

this.index.set(i+1, tempIndex);

}

FIGURE 3.11

 Part 5 of the FlatFile class for the phone book example.

/**************************************************************

* Usual <code>toString</code> method.

* @return the <code>String</code> value of the file.

**/

public String toString()

{

Strings="";

for(inti=0;i<this.getSize(); ++i)

{

s += String.format("%s(idx,rec) (%d,%d) %s%n",

TAG, i, this.index.get(i),

recs.get(this.index.get(i)).toString());

}

return s;

}

} // public class FlatFile

FIGURE 3.12

 Part 6 of the FlatFile class for the phone book example.

44 CHAPTER 3FLAT FILES

to read an entire input ﬁle and to convert the entire array into a String for

printing. The former method extends (and invokes) the readRecord method, and

the latter method extends and invokes the toString method of the underlying

Record class.

We reiterate what was said in Chapter 2, that the toString method allows for

the data of the Record to be formatted and returned exactly in the format declared

to be oﬃcial by The Powers That Be without any details of the inner workings of

the Record class being exposed to the user of the method.

Because we intend to call a sort method from the driver program, we include a

sortFile method that will sort the data. (We will discuss sorting and the details of

this code in a moment.) In this case, we don’t sort by actually moving the records

around (although we could). Instead, we sort by setting up an index ArrayList;the

integer at subscript i in the index is the subscript of the ith record in the actual

ArrayList of data.

Finally, we have in our driver program a call to a search method to look up

an entry by last name, so we need the findRecord method to implement this

functionality.

 3.2 An Analysis of the Code

Using the input data ﬁle myinputdata presented in Figure 3.13, we get the output

shown in Figure 3.14. We have left in some of what would normally be considered

testing information for demonstration purposes. All of this is well and good, but

clearly this is only the ﬁrst step in producing an address book program that someone

would want to use.

Herbert 2A41 789.0123 390

Lander 2A47 789.7890 146

Winthrop 3A71 789.4667 611

Marion 2A13 789.1536 750

Adams 3A56 789.8702 522

Smith 2A46 789.5275 101

Axolotl 3A22 789.2749 102

Zokni 3A39 789.9111 350

Igen 2A21 789.6050 240

Nem 2A89 789.3383 790

FIGURE 3.13

 Input data for the indexed ﬂat ﬁle example.

3.2 An Analysis of the Code 45

Main: create and write out the empty file

Main: empty file was created

Main: read the data

Main: the data has been read

Main: write out the file

FlatFile: (idx,rec) (0,0) Herbert 2A41 789.0123 390

FlatFile: (idx,rec) (1,1) Lander 2A47 789.7890 146

FlatFile: (idx,rec) (2,2) Winthrop 3A71 789.4667 611

FlatFile: (idx,rec) (3,3) Marion 2A13 789.1536 750

FlatFile: (idx,rec) (4,4) Adams 3A56 789.8702 522

FlatFile: (idx,rec) (5,5) Smith 2A46 789.5275 101

FlatFile: (idx,rec) (6,6) Axolotl 3A22 789.2749 102

FlatFile: (idx,rec) (7,7) Zokni 3A39 789.9111 350

FlatFile: (idx,rec) (8,8) Igen 2A21 789.6050 240

FlatFile: (idx,rec) (9,9) Nem 2A89 789.3383 790

Main: done with the write

Main: sort the file

Main: after sorting, write the file

FlatFile: (idx,rec) (0,4) Adams 3A56 789.8702 522

FlatFile: (idx,rec) (1,6) Axolotl 3A22 789.2749 102

FlatFile: (idx,rec) (2,0) Herbert 2A41 789.0123 390

FlatFile: (idx,rec) (3,8) Igen 2A21 789.6050 240

FlatFile: (idx,rec) (4,1) Lander 2A47 789.7890 146

FlatFile: (idx,rec) (5,3) Marion 2A13 789.1536 750

FlatFile: (idx,rec) (6,9) Nem 2A89 789.3383 790

FlatFile: (idx,rec) (7,5) Smith 2A46 789.5275 101

FlatFile: (idx,rec) (8,2) Winthrop 3A71 789.4667 611

FlatFile: (idx,rec) (9,7) Zokni 3A39 789.9111 350

Main: done with the write

Main: find the data item 'Buell'

Main: record Buell not found

Main: done with first search

Main: find the data item 'Smith'

Main: found record 'Smith 2A46 789.5275 101'

Main: done with second search

Main: done with main

FIGURE 3.14

 Output data for the indexed ﬂat ﬁle example.

 3.2.1 Sorted Data?

First oﬀ, we notice that the records are not presented to the program from the

input ﬁle in sorted order. This is normal; life doesn’t always present things to us in

the order in which we would like them to happen.

46 CHAPTER 3FLAT FILES

We have two choices when presented with unsorted data. One is that we can read

the data and physically move the records into their proper order.

This is what is

normally done on library shelves; the books are physically kept in a deﬁned order

based, for most university libraries, on the Library of Congress indexing system.

This physical sorting of data is acceptable only if the following three conditions

are met:

1. We don’t have a lot of data that would have to be moved.

2. The records are to be accessed in only one order (last name, in this case).

3. The data will be fairly static, because we don’t want to have to keep moving

things around all the time.

The problem of sorting data by moving data around and then keeping the data

physically in sorted order is increased when one takes into consideration the problem

of deleting records. In a ﬂat ﬁle image such as we have created in memory so

far, a record that is deleted in the middle presents a problem. The naive way

to ﬁll the empty space left by the deleted record would be to shift the records

up one space as needed. If the deleted record is the last record in the physical

order, there is actually nothing to do (except adjust the size counter), but if the

deleted record is the ﬁrst record in the physical order, its deletion requires the entire

array to be shifted up. That’s a lot of work with no beneﬁt except to take out a

blank space.

The three conditions that make it reasonable to sort by moving data are rarely

met in practice, so the customary way to deal with this kind of data—and your

second option—is to use an index.Wewillreadthedataandstoreitintoanarray

in the physical order in which it presents itself, but then we will invert on last name

as a key to get an index. The ith index value will be the subscript of the record that

should appear in position i in sorted order. If we were to sort and produce an index,

we would get the second column of Figure 3.15; this would be produced by code

like the following. The important thing to notice here is the indirect reference;we

don’t access the record at subscript i but rather the record at subscript index[i].

for(int i = 0; i < size(); ++i)

{

System.out.printf("%d %d %s%n",

i, index[i], record.getName(index[i]);

}

In the pre-Cambrian era of microcomputers, the author had an outside consulting job that

required sorting an address book on an early “personal” computer. The sort implemented by the

vendor for use on 1979-era 5

-inch ﬂoppy disks was horrendously ineﬃcient. Putting a full disk

of records even once into sorted order required so much reading and writing to and from the disk

that the disk became scratched to the point that it was no longer readable.

3.2 An Analysis of the Code 47

Sorted Physical Last Name

Order Order

Subscript Subscript

0 4 Adams

1 6 Axolotl

2 0 Herbert

3 8 Igen

4 1 Lander

5 3 Marion

6 9 Nem

7 5 Smith

8 2 Winthrop

9 7 Zokni

FIGURE 3.15  Indexed ﬁle.

 3.2.2 A Simple but Inefﬁcient Sorting Algorithm

Our data is not presented in sorted order, so we have implemented a simple bub-

blesort

whose code is shown in the FlatFile code.

The index array is implemented as an instance variable

private ArrayList<Integer> index

whose values run in parallel with the recs of Record data; the value of

index.get(0) is the subscript of the Record with the alphabetically ﬁrst name,

which is subscript 4 in Figure 3.15; the value of index.get(1) is the subscript of

the Record with the alphabetically next name, which is subscript 6 in Figure 3.15;

and so on. When we read the record with subscript i, we initialize index.get(i)

to have the value i. When we walk through the array in the toString method, the

code becomes:

for(inti=0;i<this.getSize(); ++i)

{

s += String.format("%n(%3d,%3d) %s",

i, this.getIndex(i),

recs.get(this.getIndex(i)).toString());

}

With small apologies to the English teachers of the world, we will write the names of the

various sorting algorithms without a space before the word “sort.”

48 CHAPTER 3FLAT FILES

Note also that we have implemented a getSize method that returns the size of

the ArrayList recs. In fact, both the ArrayLists recs and index have their own

size method. But since these values are supposed to always be the same (except

during execution of the two lines of code between adding a new Record to the

ArrayList and then adding a new entry to the index), we have implemented and

used our own method to return the size of the lists and included code in that method

to check each time that the two sizes are the same. This kind of defensive program-

ming can reduce the number of silly errors in programming and make life easier.

Most importantly, we need to implement a sorting routine, the code for which

is shown in the FlatFile program. This bubblesort is, as we shall see later, not a

terribly good sort, but it is simple to explain and it happens to get the job done

for the small data ﬁle on which we apply it.

As the following pseudocode shows, a bubblesort consists of a doubly-nested loop.

for(length from arraysize to 1 exclusive stepping backwards)

{

for(i from 0 to length-1 exclusive)

{

if(record(i) > record(i+1))

{

swap record(i) and record(i+1)

}

Let’s take a look at what this pseudocode does, where we read the “exclusive”

in the ﬁrst for loop to mean

for(i = arraysize; i > 1; --i)

and similarly in the second loop. In the ﬁrst iteration of the outer loop, with length

set to the length of the original array, we run the inner loop from i=0through

i = length-2. (We have to be careful to stop one short since we are accessing

both subscript i and subscript i+1.) If any pair of values is out of order (by this we

mean that the key value on which to sort is smaller in record(i+1) than it is in

record(i)), then we exchange record(i+1) and record(i). The eﬀect of this is

that at the end of this ﬁrst iteration of the outer loop the largest value in the entire

array will have been placed in the last location. When we decrement the outer loop

variable and run the inner loop again, the next-largest value will be placed in the

location for the penultimate subscript. When the two loops ﬁnish, the array will be

sorted in increasing order.

3.3 A Foreshadowing of Things to Come 49

In keeping with our desire not to move the data records themselves, we are

sorting keys instead of sorting the data itself. The assumption is that an index is

really just an array of integers, that the index array is likely to be stored in memory,

and thus that moving keys around is eﬃcient, whereas moving data that might be

stored on disk would be ineﬃcient and slow. It is also likely to be the case in a

serious application that we would want to sort and then retrieve on multiple keys,

perhaps Social Security number and phone number in addition to the last name,

and for this reason we would need to have an index of keys for each ﬁeld because

the actual data array cannot, after all, be stored in sorted order in more than

one way.

 3.2.3 Searching Through a List

From the main program we have twice issued a findRecord request to ﬁnd a com-

plete data record whose key is the String last name value provided to the method as

a parameter. Our code implements a linear search. That is, we simply walk through

the array looking for a record whose name matches the value for compareName.This

is certainly a simple method to write, but like most simple methods its perfor-

mance is not very good. We have included the code to count the number of tests

performed because we wish to illustrate that the performance is relatively poor. If

we are searching in an unordered list for a random value of name, and the value

happens to be in the list, then on average we would expect to have to look halfway

through the list in order to ﬁnd what we are looking for. If we are searching for a

value that is in fact not in the list, we will have to search all the way through in

order to determine that the value just isn’t in the list.

If we take the time to sort the list, our search can end either when we ﬁnd what

we are looking for or when the next name in our list is alphabetically greater than

the one we are looking for. With the sorted list, for example, we can quit looking for

the name “Buell” as soon as we encounter “Herbert,” because “Herbert” is greater

than “Buell” and therefore all the names past “Herbert” in the sorted list will also

be greater than “Buell.” On average, therefore, a search for a record will require

looking at half the list whether or not the record is present.

 3.3 A Foreshadowing of Things to Come

The indexed ﬂat ﬁle example, although it is a simple enough program, provides a

good starting point for mentioning where this course is headed.

A large part of this course is about the storage, management, manipulation, and

retrieval of information in eﬃcient ways. The general rule is that there is probably

50 CHAPTER 3FLAT FILES

a simple data structure or algorithm that could be implemented, and that there

is no free lunch: because the structure or algorithm is simple, it will be ineﬃcient.

Generally, you do indeed get what you pay for in life. A bubblesort is simple and

easy to code. Performing a sequential search through a pile of records is simple, but

even if the records are in sorted order, you will on average have to search through

half the pile to ﬁnd what you are looking for. Simple and thus ineﬃcient is OK for

small-scale implementations—if one’s purpose really is to manage a list of records

in a personal address book, then the code presented here is not a bad start. But

if the records list gets large (like the Los Angeles phone book) or the records are

retrieved constantly (like merchandise looked up on the Amazon.com website), then

simple and ineﬃcient is unacceptable.

Similarly, the storage of data can be done eﬃciently or ineﬃciently, and simplis-

tic methods are usually ineﬃcient. Just as with a shelf of books in a library, the

standard ﬂat ﬁle poses problems as records are inserted and deleted. The simplest

method for “deleting” a record would be to leave the record in place but add a

ﬂag that indicates the record has been deleted. After a large number of deletions,

though, a sequential search might waste a lot of time walking past deleted records,

and the total space required would be much larger than is actually needed. Closing

up the holes takes work, and it takes more work to open a hole to insert a new record

into sorted order. In this course, we will study several methods for dynamic data

structures that permit eﬃcient insertion and deletion and that allow for continued

eﬃcient search and retrieval of the valid records in the structure.

The other part of this course is the development of good programming skills and

techniques. Just as good data structures are used for organizing the actual data to

be managed, good programming style is in part an intellectual organization of the

software that manages and accesses the data. A main program, for example, should

deal only with things at the very top level of the program: Get the information

from the user, console, or input ﬁle; call the method doTheWork to do the work;

and then tidy up at the end and ﬁnish. The main program has no need to know

the implementation details of how the work is accomplished; it only needs to know

that the work will be done and the results will be returned in a standard way. The

FlatFile code thus can be reused to handle an entirely diﬀerent Record class with

no changes at all.

This information hiding is central to good programming style. The code that

implements the ﬂat ﬁle need not know anything about the “records” that it is

handling; it need only know that there are things called records and that there is

a linear list of them. Java, as with other object-oriented programming languages,

enforces at the syntactic level a compliance with the strictures of the program design

“religion” that it supports. You may feel that this is restrictive and prevents you

from exercising your artistic freedom in how programs are designed. That’s entirely

the point; the idea is that if the compiler and the IDE can impose constraints

3.3 A Foreshadowing of Things to Come 51

on what is syntactically correct, then it is more likely that a syntactically correct

program will also be semantically correct.

Looked at another way: We all make mistakes, and in writing programs each of

us is likely to have our own short list of mistakes that we will make repeatedly if we

aren’t careful. Part of the discipline of programming, to borrow a term that Dijkstra

used, is for each of us to recognize what those mistakes are and try consciously to

remember not to make them. Part of the discipline imposed by Java and languages

like Java is that many mistakes that are easy to make and are commonly made are

harder to make in Java because the syntax of the language makes it harder to make

those mistakes.

 3.3.1 Effective Sorting Techniques

One of the methods in FlatFile implements a bubblesort for the purpose of cre-

ating an index by sorting keys. It is not hard to show that a properly implemented

bubblesort requires a constant (ﬁxed for the given implementation) times N

com-

parisons in order to sort an array of N data items. It can be proved (and we will

prove it in this course) that any sort that works by comparing values (as does

bubblesort) requires at least a constant times N log N comparisons, and there are

several sorting algorithms that achieve this lower bound. Since we know from cal-

culus that log N is much smaller than N as N gets large, clearly a bubblesort is

not the most eﬃcient sort for large data arrays; its only virtue is its simplicity. We

show in Figure 3.16 the graph of N

against N log N,forN running from 0 through

about 10000. Without even looking at the actual numerical values, it is clear from

the ﬁgure that a good sorting algorithm with a running time of N log N “steps”

(we will deﬁne later the concept of “step” for sorting algorithms) is faster than

a bubblesort for all but the very smallest of problems, and the superiority of the

better algorithm is increasing.

We shall study some diﬀerent sorting algorithms. Our interest is in algorithms

that are eﬃcient in terms of both the number of comparisons and the extra storage

space needed. Bubblesort, for example, needs only the one extra storage location to

store the temporary value of a key when the ith and jth items are being swapped.

Mergesort, for example, works in N log N time, but it requires as much as N/2extra

storage locations.

The problem of sorting records is still an active research area, and we will only

scratch the surface of the topic.

In George Orwell’s 1984, the government strove to remove from the language all those words

with which “thoughtcrime” could be committed. Once all the words with which thoughtcrime

could be expressed were removed from the language, then thoughtcrime itself could not exist.

Java works much the same way. If the language won’t let you write something that The Powers

That Be have determined to be Bad Form, then it becomes that much harder to write code that

possesses Bad Form.

52 CHAPTER 3FLAT FILES

2T 4T 6T 8T 10T

10M

N lg N

FIGURE 3.16  N

graphed against N lg N (T =10

, M =10

 3.3.2 Effective Search Techniques and Data Structures

for Maintaining Priority

One of the methods in Flatfile implements a linear search to ﬁnd an item in a

sorted list. The ﬁrst of the more sophisticated things we will learn, later in this

chapter, is binary search,arecursive divide-and-conquer technique that is almost

always a win when computing things.

Divide and conquer is a basic theme in computation: To solve a problem, recur-

sively cut the size of the problem in half at each stage. We begin in binary search

with a list of N data items. We cut the problem in half by determining whether the

data item is (or rather, could be, since we don’t know that it is actually present in

the list) in the ﬁrst half of the data or the second half of the data. We then ask the

same question of the data array that is now only half as big as we started with.

In general, it is not hard to show that binary search in a sorted array of N =2

items can be done with at most n tests (called probes), and that no similar search

can be done any faster than that. That is, binary search is optimal. We will see

later, for example, that cutting the array into thirds and doing a recursive search

on the one-third of the data in which a sought-for element might appear will not

go any faster, and it would require two tests at every stage, rather than one, to

determine which third might hold the item for which we were searching.

3.3 A Foreshadowing of Things to Come 53

 3.3.3 Static Versus Dynamic Storage Structures

One of the main questions to be asked when dealing with a computational problem

is whether the data is essentially static or dynamic, that is, whether the data as

a whole changes slowly or rapidly. A transcript ﬁle at a university, for example, is

essentially a static collection of data. Although there will be additions and dele-

tions and updates of grades, local addresses, and such, the data collection is largely

static. Contrast this with the queue of processes ready to be executed in a com-

puter. In a modern computer, a large number of processes will be active at any

point in time, listening for mouse movements, clicks, and keystrokes; managing the

display of multiple windows that might even have ongoing updates or audio or

video playing; as well as the processes that are more realistically viewed as hav-

ing been initiated by the user. The goal of the scheduler in the OS is to keep

these processes sorted by priority (we don’t want any hiccups in the Internet radio

that plays in the background) and to cycle through those ready processes quickly

and eﬃciently.

A similar task awaits the computer whose job is to collect Internet packets and

assemble them into messages for delivery. Packets are discrete data items, each

one being part of an entire Internet transmission. They may take diﬀerent paths

from source to destination, they may arrive out of order, and in a large operation

they will be coming in at a tremendous rate. It is the job of some computer in

the ﬂow of traﬃc to reassemble these packets into their original messages, with the

data payloads in order, and to be able to handle simultaneously the assembly of

packets for multiple messages at a time. This is a classic dynamic data processing

problem: A payload of unknown size is broken into packets; the packets must be

sorted as they come in; and when the message is fully reassembled, we can send on

the complete message and delete that data from storage.

Handling these two diﬀerent kinds of problems usually takes diﬀerent kinds of

data structures as well as an assessment of the overall characteristics of the prob-

lem. Having all the data before we begin can be a very good thing; being forced to

organize data before we have any idea what the data looks like can lead to patho-

logically bad performance. We have seen the value of having data in sorted order.

If we have data that undergoes updates only once a week, for example, and it takes

an hour to sort the data from scratch, then it is perfectly reasonable to consider

that hour an investment worth making. If the data is updated once an hour, and

it takes an hour to sort, then the time spent sorting is not an investment but an

ongoing expense.

 3.3.4 Generic and Abstract Classes

Java has a powerful capability in its generic construct and abstract classes that allow

common methods to handle diﬀerent data types. We have already seen with the

54 CHAPTER 3FLAT FILES

compareTo method the power of having a standard syntax for a common operation:

we implement comparison of objects as

thisObject.compareTo(thatObject)

with a returned result that is a negative number if this is less than that,apositive

number if this is greater than that, and a zero if they are equal.

You have probably already used, without necessarily being made aware of them,

some of this abstraction capability of Java. The Java generics and abstract classes

permit the programmer to deﬁne templates and general-purpose methods that act

on objects whose data types are not speciﬁed at the time the code is written. This

provides another vehicle for separating the data structure that supports eﬃcient

computation from the underlying data on which the computation is performed.

 3.3.5 Arrays Versus ArrayLists

The experienced Java programmer will recognize that we have implemented the

FlatFile program using an ArrayList instead of an array. This is because for

one-dimensional lists of things, whose size might not be known at compile time,

the ArrayList is a much simpler construct to use than an array. Storage for an

ArrayList is done dynamically, with no preallocation needed and no out-of-space

errors from allocating too little space for the input data. The size method allows

one to determine the actual number of items stored and decreases the chance of

writing code that steps out of bounds. The deletion of items (which we didn’t

happen to do here) is done with a remove method that does the list compression

to remove the hole caused by deletion.

AJavaArrayList also has a contains method that does the lookup that we

implemented with the linear search method, and that should in fact implement a

binary search as described in the next section.

We note, however, that there is no inherent sorting function for an ArrayList and

thus we would still need to sort the records and create an index if we wanted to deal

with the ArrayList in sorted order. There are other Java collection classes that do

implement sort order capabilities. Just as with the binary search for the contains

method, we would have to assume that the implementors of the sort methods had

used an eﬃcient algorithm. Indeed, it is fair to say that all the algorithms and tech-

niques presented in this course have already been implemented in Java in some class.

 3.4 Binary Search: Our First Mature Algorithm

The ﬁrst “mature” algorithm we will consider, as a prelude to the rest of the text,

is a binary search. We are going to introduce this now, although this may seem to

be a bit oﬀ-topic at the moment, because we want to contrast the basic nature of

binary search against a more naive linear search.

3.4 Binary Search: Our First Mature Algorithm 55

In our earlier search method, we did a linear search, entry by entry, through the

ArrayList until the desired record was found. For a randomly chosen record for

which to search, we would expect to have to search on average halfway through

a list of N entries. It is equally likely that we would have to search through to

location k as it is that we would have to search to location N − k. Since these two

searches together require k +(N − k)=N “probes,” they each require on average

N/2 probes into the list.

A binary search proceeds by a recursive divide-and-conquer approach to search-

ing for an element. If we have paid the initial price of sorting the data, then binary

search is the most natural algorithm to use for searching because it is relatively

simple (although not totally trivial) and because it can be shown to be the fastest

search possible.

In a recursive divide-and-conquer search, we start with a list of N entries in a

list L, which for the moment we can assume to be simply a list of sorted integers,

with no repeated values. We want to know if a given integer k is one of the elements

in the list, and if it is, what its subscript value is.

Our ﬁrst probe is to compare k against the entry stored halfway into the list at

location L[N/2], whose value we will call m.Ifk<m, and the list L is assumed

to be sorted, then k, if it exists in the list, will be found in the ﬁrst half of the

list because all the entries in the second half of the list are larger than m and thus

larger than k.Ifk>m, and the list L is assumed to be sorted, then k,ifitexists

in the list, will be found in the second half of the list because all the entries in the

ﬁrst half of the list are smaller than m and thus smaller than k.

With one probe, then, we have reduced the problem of searching a list of N

entries to the problem of searching a list of N/2 entries. We now do a second probe

to cut the list from size N/2 in half again to size N/4. And again. And again.

Consider the example in Figure 3.17. Assume we have a list of 16 names in

sorted order (history buﬀs will note that we skipped one in order to have unique

surnames): Let’s say we do a lookup on “Tyler.” We compare Tyler against the

Loc. Name Loc. Name

0 Adams 8 Madison

1 Buchanan 9 Monroe

2 Fillmore 10 Pierce

3 Harrison 11 Polk

4 Jackson 12 Taylor

5 Jeﬀerson 13 Tyler

6 Johnson 14 Van Buren

7 Lincoln 15 Washington

FIGURE 3.17  A sorted list for a binary search.

56 CHAPTER 3FLAT FILES

name at subscript 8, which is “Madison” (we are indexing zero-up, just like Java,

and we arbitrarily choose always to round up to the ﬁrst subscript in the upper

half of any list or sublist if we have to break ties). Tyler is larger than Madison, so

we know we have a record in the second half of the list (if it’s there at all). Since

we have retrieved the record at subscript 8, we test equality as well as greater-

equal/less-equal, and ﬁnd that we don’t have equality. The second half of the list

has subscripts 8–15, and we know we didn’t get a match at subscript 8, so we are

looking at subscripts 9–15. The midpoint (and this time it’s actually the midpoint)

is 12, and the entry at subscript 12 is “Taylor.” Tyler is still larger than Taylor, so

we continue searching in the short list of subscripts 13–15, since we know it’s not

at subscript 12. Comparing against subscript 14, “Van Buren,” we know we are in

the ﬁrst half of that sublist. The sublist has length 1, though, so when we compare

against subscript 13, which is “Tyler,” we get a match.

If we look for the name “Clay” (remembering that Henry Clay very much wanted

to become a U.S. president but never managed to do so), we compare against

subscripts 8 (Madison), then 4 (Jackson), then 2 (Fillmore), then 1 (Buchanan).

Since we don’t ﬁnd a match, we know that Clay is not in the list.

As should be obvious, the diﬃcult part about writing a method for this is getting

right the concept of “halfway.” With an odd number of items in the list, there is a

midpoint. With an even number of items in the list, one has to decide whether to

take the last entry in the ﬁrst half or the ﬁrst entry in the second half.

If the list is of N =2

entries, then it will take at most n =log

N probes to

reduce the problem to looking at a list of one entry, which is merely a single test.

To verify that Clay was not in the list of 2

= 16 items, for example, we checked

the four entries at subscripts 8, 4, 2, and 1.

Now we need to clarify a couple of points on which we have been a little sloppy.

First, when we execute a probe, we can test both

if(this.compareTo(that) < 0)

and

if(this.compareTo(that) == 0)

If we ﬁnd the entry for which we are looking, then of course we have solved the

problem. If not, we continue. Technically this would split the list not into two

halves but into one sublist larger than the probed value, one sublist smaller than

the probed value, and the probed value itself. If we test in this way, we can shorten

our sublist by one by not including the probed value, since we have already tested it.

Similarly, if N is not an exact power of 2, the length in time of the binary search

is really no worse than it would be for the next larger power of 2. If we were to

3.4 Binary Search: Our First Mature Algorithm 57

increase the size of the array to the next larger power of 2 by padding at the end

with values of MAXIMUM

INTEGER (whatever that value is), we will increase the total

cost of the search by at most one extra probe.

The code for a binary search method that could be substituted for the linear

search in our FlatFile example is given in Figure 3.18.

/**************************************************************

* Method to find a record in the flat file.

* This is binary search. Start with lower and upper subscripts.

* Compute the middle subscript. If the value in the middle is

* larger than the item sought, move the upper subscript down.

* If the value in the middle is smaller, move the lower subscript

* up. Then repeat until found or until we meet in the middle.

* @param name the <code>String</code> value to search for.

* @return the subscript of the found record, or -1 if not found.

**/

public Record findRecordBinary(String name)

{

final int NOTFOUND = -1;

int lowerSub, midSub, returnSub, upperSub;

Record returnValue = null;

lowerSub = 0;

upperSub = this.getSize()-1;

midSub = (lowerSub + upperSub)/2;

returnSub = NOTFOUND;

while(lowerSub < upperSub)

{

String thisMidName = this.recs.get(this.index.get(midSub)).getName();

if(0 > thisMidName.compareTo(name))

lowerSub = midSub + 1;

else if(0 < thisMidName.compareTo(name))

upperSub = midSub - 1;

else

{

returnSub = midSub;

break;

}

midSub = (lowerSub + upperSub)/2;

}

if(NOTFOUND != returnSub)

returnValue = this.recs.get(this.index.get(returnSub));

return returnValue;

}

FIGURE 3.18

 The binary search code.

58 CHAPTER 3FLAT FILES

The use of a divide-and-conquer approach is so pervasive in computing that a

new mathematical symbol is used to describe the running time of algorithms of

this sort. We know that if N =2

, then n =log

N,whichwewriteasn =lgN.

In Chapter 6, we will do more formal asymptotic analysis of algorithms, but the

power of binary search over linear search is easily seen from the graph in Figure 3.19,

showing N/2 graphed against lg N. Clearly binary search is superior, and it becomes

better and better the larger the size of the list to be searched. The graph of lg N is

barely noticeable. This is not entirely surprising—2

is about eight million, so the

binary log of 10

at the right edge of the graph is only a little larger than 23, and

clearly will barely register on a y-axis measured in millions.

2M 4M 6M 8M 10M

N/2

FIGURE 3.19  N/2 graphed against lg N (M =10

 3.5 Summary

We have covered in this chapter a program for managing, in a naive way, basic

data records in a ﬂat ﬁle or spreadsheet-like structure. A large number of computer

applications have essentially this purpose: to maintain a ﬁle of records, with the

ability to add records, edit records, delete records, sort records in any of a number

of diﬀerent ways, and search and retrieve records that match some search ﬁeld.

For small ﬁles of no more than a few thousand records, a simplistic program

such as the one presented here would be more than adequate. For large software

systems, however, all the functions—adding, editing, deleting, sorting, searching,

and retrieving—must be done more eﬃciently. The purpose of this course and this

text is to introduce data structures for more eﬃcient management of data, together

with algorithms that process data in more eﬃcient ways or on those more eﬃ-

cient structures. We began the process of increasing the sophistication of skills by

introducing binary search.

3.6 Exercises 59

 3.6 Exercises

(Sample data for many of these exercises can be found on the website at

go.jblearning.com/Buell.)

1. The current code has the input and output ﬁlenames hard-coded. Modify the

driver main method to ask the user for these ﬁlenames. Make sure you catch

the exceptions on opening the ﬁles, since that is very important for correct

execution of programs.

2. The current FlatFile code has a findRecord method that uses the

compareName method of the Record class. Modify this method to call a more

general-purpose findByKey method, and implement the findByKey method in

the Record class.

3. If you have done the previous exercise, you will notice that you can overload

the findByKey method in the Record class to allow for either String or int

variables as parameters. Write these overloaded methods.

4. If you have done the previous two exercises, you will notice that you still cannot

search both by name and by oﬃce, since those two String variables are going

to be indistinguishable to findByKey. This can be ﬁxed. Create a separate

class for the name, making sure to include a nondefault constructor that takes

a String variable as parameter and assigns that to a String instance variable

inside your Name class. Do the same for the office variable. Now you can

create an overloaded findByKey(Name whatName) method that searches on

the name variable and a diﬀerent findByKey(Office whatOffice) method

that searches on the office variable. Do this, and modify FlatFile to do the

appropriate searches. The key point here is the information hiding: nowhere in

the FlatFile code should the fact that Name and Office are in fact String

values be used. All that is used in FlatFile is that one can create instances

of Name and Office by passing in a String when constructing an instance.

5. Replace the Record class with a class that handles names of CDs and artists

performing on the CDs, as if you were making a database of your own collec-

tion. Except for changing the method calls for the search, the code outside the

Record should not have to change at all.

6. If you have done the previous exercise, you know that your Record class is

insuﬃcient. You really want all the tracks in your record, so add an ArrayList

variable inside Record

that allows you to read the track names and outputs

the track names when toString is called.

7. If you have done the previous two exercises, modify the search capability to

allow searching by track name. Since it is highly unlikely that the number of

60 CHAPTER 3FLAT FILES

tracks per CD is larger than about 20, and since it is highly unlikely that the

tracks are in alphabetically sorted order, you can probably do just ﬁne with a

linear search.

8. You probably want to search your CD list both by artist name and by CD

label name, but you probably used a String variable for both of them. Modify

your CD code following the changes for the findByKey methods in the earlier

exercises to allow you to search either way.

9. Modify your linear search to count the number of tests you have to do. Increase

the size of the input data to 100 items, say, so you have enough data for the

code to chew upon. Then perform several searches. What is the average number

of tests? It should be about 50 if you are searching for items both early in the

list and late in the list. Swap out the linear search for the binary search and do

the experiment again. This time your average count of tests should be closer

to 7, since 100 ≈ 128 = 2

10. Modify the sort method so that you count the number of comparisons per-

formed, and output the number of comparisons for the original input data.

You should get (10

)/2 = 50. Now double the number of items in the input

data and run the experiment again. This time you should get (20

)/2 = 200.

Double the data input again, and check that this time it takes (40

)/2 = 800

comparisons (make sure you have unique items, as this can aﬀect the count).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3 Flat Files

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 3 Flat Files