Laying Out Your Data

Once again, the principle of encapsulation will be broken. Let's assume your application needs to store records of some sort, and each record contains two fields: an id and a value. A very object-oriented approach is to define a Record class, as shown in Listing 4–12.

Listing 4–12. Record Class

public class Record {
    private final short id;
    private final short value;
    // and possibly more
    
    public Record(short id, short value) {
        this.id = id;
        this.value = value;
    }

    public final short getId() {
        return id;
    }

    public final short getValue() {
        return value;
    }

    public void doSomething() {
        // do something here
    }
}

Now that the Record class is defined, your application could simply allocate an array, save the records in that array, and provide additional methods, as shown in Listing 4–13.

Listing 4–13. Saving Records

public class MyRecords {
    private Record[] records;
    int nbRecords;

    public MyRecords(int size) {
        records = new Record[size];
    }

    public int addRecord (short id, short value) {
        int index;
        if (nbRecords < records.length) {
               index = nbRecords;
            records[nbRecords] = new Record(id, value);
            nbRecords++;
        } else {
            index = -1;
        }
        return index;
    }

    public void deleteRecord (int index) {
        if (index < 0) {
            // throw exception here – invalid argument
        }
        if (index < nbRecords) {
            nbRecords--;
            records[index] = records[nbRecords];
            records[nbRecords] = null; // don't forget to delete reference
        }
    }

    public int sumValues (int id) {
        int sum = 0;
        for (int i = 0; i < nbRecords; i++) {
            Record r = records[i];
            if (r.getId() == id) {
                sum += r.getValue();
            }
        }
        return sum;
    }

    public void doSomethingWithAllRecords () {
        for (int i = 0; i < nbRecords; i++) {
            records[i].doSomething();
        }
    }
}

All of this would work and would result in a pretty clean design. However, there are drawbacks that may not be visible until you actually run the code:

  • A new object is created every single time a record is added to the array. While each object is lightweight, memory allocations are still somewhat costly and could be avoided.
  • Calls to getId() and getValue() could be avoided if id and value were public.

If you are allowed to modify the Record class, making id and value public is obviously trivial. The implementation of sumValues() would then be slightly modified, as shown in Listing 4–14.

Listing 4–14. Modified sumValues()

    public int sumValues (int id) {
        int sum = 0;
        for (Record r : records) {
            if (r.id == id) {
                sum += r.value;
            }
        }
        return sum;
    }

However, this alone does not reduce the number of allocations at all; record objects still need to be created as records are added to the array.

NOTE: You could avoid allocations easily in C/C++, but in Java all objects are actually references and have to be created with the new operator.

Since all objects are allocated in the heap, and you can only store references to objects in the array, you can modify the MyRecords class to use an array of short values to remove the allocations. The modified class is shown in Listing 4–15.

Listing 4–15. Modified MyRecords Class Using a Short Array

public class MyRecords {
    private short[] records;
    int nbRecords;

    public MyRecords(int size) {
        records = new short[size * 2];
    }

    public int addRecord (short id, short value) {
        int index;
        if (nbRecords < records.length) {
            index = nbRecords;
            records[nbRecords * 2] = id;
            records[nbRecords * 2 + 1] = value;
            nbRecords++;
        } else {
            index = -1;
        }
        return index;
    }

    public void deleteRecord (int index) {
        if (index < 0) {
            // throw exception here – invalid argument
        }
        if (index < nbRecords) {
            nbRecords--;
            records[index * 2] = records[nbRecords * 2];
            records[index * 2 + 1] = records[nbRecords * 2 + 1];
        }
    }

    public int sumValues (int id) {
        int sum = 0;
        for (int i = 0; i < nbRecords; i++) {
            if (records[i * 2] == id) {
                sum += records[i * 2 + 1];
            }
        }
        return sum;
    }

    public void doSomethingWithAllRecords () {
        Record r = new Record(0, 0);
        for (int i = 0; i < nbRecords; i++) {
                r.id = records[i * 2];
                r.value = records[i * 2 + 1];
                r.doSomething();
        }
    }
}

Let's imagine that, later on, you find out these things about how the MyRecords class is used:

  • sumValues() is called much more often than doSomethingWillAllRecords().
  • Only a few records in the array share the same id.

In other words, that would tell you the id field is read much more often than the value field. Given this additional piece of information, you could come up with the following solution to improve performance: using two arrays instead of one, to keep all the ids close together, maximizes cache hits when sequentially going through the array of ids in sumValues(). The first array contains only record ids, while the second array contains only record values. Consequently, more record ids are found in the cache when sumValues() runs as twice as many record ids would be stored in a single cache line.

The new implementation of MyRecords is shown in Listing 4–16.

Listing 4–16. Modified MyRecords Class Using Two Arrays

public class MyRecords {
    private short[] recordIds; // first array only for ids
    private short[] recordValues; // second array only for values
    int nbRecords;

    public MyRecords(int size) {
        recordIds = new short[size];
        recordValues = new short[size];
    }

    public int addRecord (short id, short value) {
        int index;
        if (nbRecords < recordIds.length) {
            index = nbRecords;
            recordIds[nbRecords] = id;
            recordValues[nbRecords] = value;
            nbRecords++;
        } else {
            index = -1;
        }
        return index;
    }

    public void deleteRecord (int index) {
        if (index < 0) {
            // throw exception here – invalid argument
        }
        if (index < nbRecords) {
            nbRecords--;
            recordIds[index] = recordIds[nbRecords];
            recordValues[index] = recordValues[nbRecords];
        }
    }

    public int sumValues (int id) {
        int sum = 0;
        for (int i = 0; i < nbRecords; i++) {
            if (recordIds[i] == id) {
               sum += recordValues[i]; // we only read the value if the id matches
            }
        }
        return sum;
    }

    public void doSomethingWithAllRecords () {
        Record r = new Record(0, 0);
        for (int i = 0; i < nbRecords; i++) {
            r.id = recordIds[i];
            r.value = recordValues[i];
            r.doSomething();
        }
    }
}

You may not always be able to apply this kind of optimization though. For example, the listing above assumes doSomething() does not modify the Record object and assumes MyRecords does not provide any method to retrieve Record objects from the array. If these assumptions ever become false, then the implementations in Listings 4–15 and 4–16 would no longer be equivalent to the one in Listing 4–13.

Keep in mind that you may not be able to optimize your code properly until you find out how your code is used. Again, follow a pragmatic approach: don't start optimizing until you know what problem you are trying to solve, as optimizing one usage pattern may negatively impact other patterns.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.164.143