Chapter 2. Reference implementation

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2. Reference implementation

This chapter covers

Using standard collections
Creating diagrams to illustrate a software design
Expressing performance in big-O notation
Estimating the memory footprint of a class

In this chapter, you’ll examine a version of the Container class that strikes a good balance between different qualities, such as clarity, efficiency, and memory usage.

As you recall from section 1.7, I made the assumption that storing and maintaining the set of indirect connections between containers would suffice. In practice, you do this by equipping each container with a reference to the collection of containers directly or indirectly connected to it, called its group. Being familiar with the Java Collections Framework (JCF) (see sidebar), let’s go hunting for the best class to represent one of these groups.

Java Collections Framework

Most standard collections were introduced in Java 1.2 and were heavily redesigned for the 1.5 release (later renamed Java 5) to take advantage of the newly introduced generics. The resulting API is called the JCF and is one of the crown jewels of the Java ecosystem. It comprises approximately 25 classes and interfaces, providing common data structures such as linked lists, hash tables, and balanced trees, as well as concurrency-oriented facilities.

When choosing the right type to represent a collection of items, you should consider two questions: whether the collection will contain duplicates, and whether the ordering of the elements is relevant. In our case, the answer to both questions is “no.” In other words, container groups function as mathematical sets, corresponding to the Set interface from the JCF, as shown in figure 2.1.

Figure 2.1. Choosing the right Java interface and class for a collection of items

Next, you need to choose an actual implementation of Set, that is, a class that implements said interface. You have no reason to depart from the most common and generally most efficient choice: HashSet.

Pop Quiz 1

Which collection interface and class would you choose to represent your phone’s contact list?

C# collections

C# collection hierarchy differs somewhat from Java’s, but the range of concrete classes you ultimately instantiate is quite similar. For example, here are the C# closest matches to each of the classes I mentioned in figure 2.1:

Java	C#
HashSet	HashSet
TreeSet	SortedSet
ArrayList	List
LinkedList	LinkedList

2.1. The code [Reference]

Let’s design the reference version of the Container class, starting from its fields and constructor. The nickname for this version will be, well, Reference. According to the previous discussion on collections, you equip every new container with an initial group consisting of that container only and represented by a HashSet. Following the program to an interface best practice, you should declare the group field as a Set and then instantiate it as a concrete HashSet. Think of this as hiding the concrete type from the rest of the class. A benefit of this approach is that if you later change your mind and switch the concrete type from HashSet to some other implementation of Set, the surrounding code stays unchanged because it refers only to the interface.

Programming to an interface. . .

. . .refers to the general idea of focusing your design efforts around APIs, rather than concrete implementations. It’s akin to the design-by-contract methodology I discuss in chapter 5. Declaring a field with the most general interface type that gets the job done is a small-scale application of this principle.

Additionally, each container is aware of the amount of water in it, encoded by a double value that is implicitly initialized to zero. You should end up with code similar to the following listing.

Listing 2.1. Reference: Fields and constructor

import java.util.*;  1 The following listings will omit the import statements.

/* A water container.  2 This freestyle comment should be
 *                       in Javadoc instead (see chapter 7).
 * by Marco Faella
 */
public class Container {

   private Set<Container> group;  3 Containers connected to this one
   private double amount;         4 Amount of water in this container

   /* Creates an empty container. */ 5 This should also be in Javadoc.
   public Container() {
      group = new HashSet<Container>();
      group.add(this);               6 Group starts with this container.
   }

For starters, compared with Novice, this version uses proper encapsulation and naming: fields are private and have reasonably descriptive names. Then, I have intentionally commented the code in a rather naive way to contrast this style with the more principled approach that I discuss in chapter 7, where readability becomes the central issue.

Before presenting the various methods, I’ll introduce a couple of graphical devices that will be useful for visually comparing different versions of containers that I’ll present in the following chapters.

2.1.1. Memory layout diagrams

For every version of Container that uses a different choice of fields to represent its data, I’ll show a memory layout picture, which is an abstract illustration of how a given set of containers is realized in memory. The intent is to help you build a visual mental model of that representation, and to ease comparison between different versions. To that end, I’ll always depict the same scenario, namely the standard use case I described in chapter 1, the way it looks when the first three parts have been executed. Recall that those parts create four containers (a to d) and execute the following lines:

a.addWater(12);
d.addWater(8);
a.connectTo(b);
b.connectTo(c);

At this point, three of the four containers are connected in a group, and the fourth one is isolated, as shown in figure 2.2. The memory layout diagram is a simplified scheme of how the objects are arranged in memory, similar to UML object diagrams (explained in the following subsection). Both display static snapshots of a set of objects, including the value of their fields and their relationships. In this book, I prefer to use my own style of object diagram because it’s more intuitive and I can tailor it to the specific point I’m trying to make in each section. Figure 2.3 shows the memory layout of Reference, after the first three parts of UseCase. As you can see, I’ve omitted many low-level details, such as the type and width in bytes of each field. Additionally, I’ve completely hidden the internal composition of the HashSet, because right now I’d like you to focus on which object contains each piece of information, and which object points to which other object. We’ll return to the memory layout of a HashSet in section 2.2.1.

Figure 2.2. The situation after executing the first three parts of UseCase. You have connected together containers `a` through `c` and poured water into `a` and `d`.

Figure 2.3. The memory layout of Reference after executing the first three parts of UseCase. To avoid clutter, I’ve pictured the references from the `HashSet`s back to the containers as reaching the name of the container.

Unified Modeling Language

Unified Modeling Language (UML) is a standard providing a rich collection of diagrams for describing various aspects of a software system. Class diagrams and sequence diagrams are two of the most commonly used parts of the standard. You’ll see an example of a sequence diagram in chapter 3.

Naturally, in your job you’re more likely to encounter standard UML diagrams, so here’s a brief reminder of two common types of UML diagrams.

UML Class Diagram

A class diagram is a description of the static properties of a set of classes, particularly regarding their mutual relationships, such as inheritance or containment. The just-mentioned object diagrams are closely related to class diagrams, except that they depict individual instances of those classes.

For example, a class diagram for Reference may look like figure 2.4. The Container box is quite self-explanatory, listing fields and methods, whose visibility is denoted by a plus (public) or minus (private) sign. The HashSet box doesn’t specify any field or method, and that’s perfectly fine for such diagrams; they can be as abstract or as detailed as you wish.

Figure 2.4. UML class diagram for Reference (detailed version)

The line between the two boxes is called an association and represents a relation between two classes. At each end of the line, you can describe the role of each class in the association (“Member” and “Group” here) and the so-called cardinality of the association. The latter specifies how many instances of that class are in relation to each instance of the other class. In this case, each container belongs to a single group, and each group includes one or more members, denoted by “1..*” in UML.

Although formally correct, the class diagram in figure 2.4 is too detailed for most purposes. UML diagrams are intended to describe a model of the system, not the system itself. If a diagram becomes too detailed, you might as well replace it with the actual source code. Hence, you normally don’t mention standard collections such as Hash Set explicitly. Rather, they’re interpreted as just one possible implementation of an association between classes.

In this case, you can replace the HashSet with a more abstract association linking the Container class with itself. In this way, rather than describing the implementation, you’re conveying the idea that each container may be connected to zero or more other containers. Figure 2.5 represents this graphically.

Figure 2.5. UML class diagram for Reference (abstract version)

Pop Quiz 2

Use a class diagram to represent the main attributes of Java classes and methods, and their mutual relationships.

UML Object Diagram

UML object diagrams appear very similar to class diagrams. You distinguish objects (that is, class instances) from classes by having their names and types underlined. For example, figure 2.6 shows the object diagram for Reference, after executing the first three parts of UseCase. That figure is consistent with the abstract class diagram in figure 2.5, where the HashSets are not explicitly modeled, but rather hidden within the association between containers.

Figure 2.6. UML object diagram for Reference (abstract version)

In chapter 3, you’ll learn about one more type of UML diagram: the sequence diagram, designed to visualize the dynamic interactions among a set of objects.

2.1.2. The methods

The getAmount method is a trivial getter—nothing to write home about.

Listing 2.2. Reference: The `getAmount` method

public double getAmount() { return amount; }

Next, you can develop the connectTo method (listing 2.3).^[1] Start by observing that connecting two containers essentially entails merging their two groups. As a result, the method initially computes the total amount of water in the two groups and the amount of water in each container after the merge. Then, you modify the group of this container to absorb the second group, and you assign all containers of the second group to the new, larger group. Finally, you update the amount of water in each container with the precomputed new amount.

¹
Here and in the following chapters, I’m taking some liberties with code formatting, like putting a loop and its body on the same line. I’m doing so to shorten the listing and make it fit in a single page.

As before, listing 2.3 is heavily commented in an attempt to improve its readability. The modern trend, instead, would be to split it into smaller methods with suitably descriptive names. I’ll discuss that in depth in chapter 7.

Listing 2.3. Reference: The `connectTo` method

public void connectTo(Container other) {

   // If they are already connected, do nothing
   if (group==other.group) return;

   int size1 = group.size(),           1 Computes the new amount
       size2 = other.group.size();       of water in each container
   double tot1 = amount * size1,
          tot2 = other.amount * size2,
          newAmount = (tot1 + tot2) / (size1 + size2);

   // Merge the two groups
   group.addAll(other.group);
   // Update group of containers connected with other  2 You can replace
   for (Container c: other.group) { c.group = group; }   comments like this
   // Update amount of all newly connected containers    with a properly named
   for (Container c: group) { c.amount = newAmount; }    support method.
}

The addWater method simply distributes an equal amount of water to each container in the group.

Listing 2.4. Reference: The `addWater` method

   public void addWater(double amount) {
      double amountPerContainer = amount / group.size();
      for (Container c: group) c.amount += amountPerContainer;
   }

As in Novice, this method accepts negative arguments—denoting water removal— and doesn’t check whether the containers hold enough water to satisfy the request. It thus runs the risk of leaving a negative amount of water in one or more containers. (I address robustness issues like this in chapter 5.) In the next two sections, we’ll analyze the memory and time consumption of the implementation I’ve presented in this section, so later you’ll be able to compare its performance with that of the following versions.

2.2. Memory requirements

Despite the fact that primitive types have a fixed size, estimating the memory size of a Java object is not trivial. Three factors render the exact size of an object dependent on the architecture and even on the JDK vendor:

Reference size
Object headers
Padding

How these factors influence the size of an object depends on the specific JVM you use to run your program. Recall that the Java framework is based on two official specifications: one for the Java language and one for the virtual machine (VM). Different vendors are free to implement their own compiler or VM, and, indeed, as of writing these lines, Wikipedia lists 18 actively developed JVMs.^[2] In the following VM-dependent arguments, I’ll refer to Oracle’s standard JVM, which is called HotSpot.

²
See http://mng.bz/zlm6.

Let’s consider each of the three memory factors in more detail. First, the size of a reference is not fixed by the language. Whereas the size is 32 bits on 32-bit hardware, on modern 64-bit processors it can be either 32 or 64 bits because of a technology called compressed ordinary object pointers (OOPs). Compressed OOPs allow the program to store references as 32-bit values, even when the hardware supports 64-bit addresses, at the cost of addressing “only” 32 GB of the total available heap space. In the following memory-occupancy estimates, assume a fixed reference size of 32 bits.

Compressed OOPs

Compressed OOPs work by implicitly adding three zeros at the end of each 32-bit address, so a stored address of, say, 0x1 is interpreted as the machine address 0x1000. In this way, machine addresses effectively span 35 bits, and the program can access up to 32 GB of memory. The JVM must also take steps to align all variables to 8-byte boundaries, because the program can only refer to addresses that are multiples of eight.

Summarizing, this technology saves space for each reference but increases padding space and incurs a time overhead when mapping stored addresses to machine addresses (a quick left-shift operation). Compressed OOPs are turned on by default but are automatically disabled if you tell the VM that you intend to use more than 32 GB of memory (with command-line options -Xms and -Xmx).

Second, the memory layout of all objects starts with a header containing some standard information that the JVM needs. As a consequence, even an object with no fields (aka a stateless object) takes up some memory. The detailed composition of the object header goes beyond the scope of this book,^[3] but three features of the Java language are ultimately responsible for it: reflection, multithreading, and garbage collection.

³
If you’re curious for details, you can browse the source code for HotSpot, currently available https://hg.openjdk.java.net/jdk10/jdk10/hotspot. The object headers’ content is described in the file src/share/vm/oops/markOop.hpp.

Reflection requires objects to know their type. Hence, each object must store a reference to its class, or a numeric identifier referring to a table of loaded classes. This mechanism allows the instanceof operator to check the dynamic type of an object and the getClass method of the Object class to return a reference to the (dynamic) class of the object. On a related note, arrays also need to store the type of their cells because every write operation into an array is type-checked at runtime (and raises Array StoreException if incorrect). However, this information does not enlarge the overhead of a single array, because it’s part of the type information and can be shared among all arrays of the same type. For example, all arrays of strings point to the same Class object, representing the type “array of strings.”
Multithreading support assigns a monitor to each object (accessible via the syn chronized keyword). Hence, the header must accommodate a reference to a monitor object. Modern virtual machines create such a monitor on demand only when multiple threads actually compete for exclusive access to that object.
Garbage collection needs to store some information on each object, such as a reference count. In fact, modern garbage collection algorithms assign objects to different generations, based on the time since they were created. In that case, the header also contains an age field.

In this book, assume a fixed 12-byte per-object overhead, which is typical of modern 64-bit JVMs. Besides this standard object header, arrays also need to store their length, leading to a 16-byte total overhead (that is, even an empty array takes 16 bytes).

Finally, hardware architectures require or prefer data to be aligned to certain boundaries; that is, they work more efficiently if memory accesses employ addresses that are multiples of some power of two (usually four or eight). This circumstance leads compilers and virtual machines to employ padding : inflating the memory layout of an object with empty space so that each field is properly aligned and the whole object fits exactly into an integer number of words. For simplicity, we’ll ignore such architecture-dependent padding issues in this book.

C# object size

The situation in C# is pretty similar to the one I’ve described here for Java, and the causes for memory overhead are exactly the same, leading to 12-byte headers for 32-bit architectures and 16 bytes for 64 bit.

2.2.1. Memory requirements of Reference

Now turn your attention to the actual memory occupancy of the Reference implementation. For starters, each Container object requires the following:

12 bytes for overhead
8 bytes for the amount field (type double)
4 bytes for the reference to the set, plus the size of the set itself

To estimate the memory footprint of a HashSet, you need to peek under the hood at its implementation. You typically implement a HashSet using an array of linked lists (called buckets), plus a couple of extra fields for bookkeeping. Ideally, each element goes into a different bucket, and there are exactly as many buckets as elements. Without going into too much detail,^[4] in this ideal scenario a barebone HashSet takes up approximately 52 bytes. Each element in the set requires one reference (to its list) and a list with one element: approximately 32 more bytes. I’m using the word bare-bone instead of empty because an empty HashSet starts with a non-zero initial capacity (16 buckets in the current OpenJDK), but it’s simpler and more orderly to ascribe that space to the first elements that will be inserted. Figure 2.7 shows in some detail the internals of the involved objects, with a breakdown of the memory requirements.

⁴
The actual implementation of HashSet goes through HashMap, complicating the analysis.

Figure 2.7. The detailed memory footprint of a container in version Reference. The estimates for `HashSet` assume a perfectly sized table of buckets and a perfect hashing function, resulting in exactly one element in each bucket.

Measuring object size

The JDK includes a tool called JOL (for Java Object Layout) that inspects the internal memory layout of a given class, including the object header. It’s available at http://openjdk.java.net/projects/code-tools/jol/.

Pop Quiz 3

The class android.graphics.Rect contains four public fields of type int. How many bytes does a Rect object take?

To get actual numbers and ease comparisons with other implementations, I’ll estimate the memory occupancy for two hypothetical scenarios: first, 1000 isolated containers; second, 1000 containers connected in 100 groups of 10 containers each. In those two scenarios, our reference implementation performs as reported in table 2.1. Are those numbers good or bad? Are 100 bytes too many for an isolated container?

Table 2.1. Memory requirements of Reference in two conventional scenarios

Scenario	Size (calculations)	Size (bytes)
1000 isolated	1000 * (12 + 8 + 4 + 52 + 32)	108000
100 groups of 10	1000 * (12 + 8 + 4) + 100 * (52 + 10 * 32)	61200

Can we do any better? It’s hard to judge those numbers as they stand. In the next two chapters, you’ll develop a number of alternative implementations, and then you’ll be able to compare their memory occupancy and answer the previous questions with solid arguments. (Spoiler alert: I present the most compact version in chapter 4, and it requires just 4 KB for both scenarios, but it doesn’t comply with the established API.)

2.3. Time complexity

When measuring the memory footprint of a program, you can use bytes as the standard basic unit. If you ignore the low-level details I discussed in the previous section, as a rule of thumb, a given Java program will take the same amount of memory when run on all computers.

The situation for time measurements is more complicated. The same program will perform in vastly different ways on different computers. Rather than measuring actual running time, you can count the number of basic steps that the program performs. Roughly speaking, you can define as a basic step any operation that requires a constant amount of time. For example, you can consider any arithmetic or comparison operation a basic step.^[5]

⁵
The formal definition of basic step must be based on a formal model of computation, such as Turing machines. You may then define a basic step as any operation that requires a constant number of steps of a Turing machine.

Another issue is the fact that the same function can execute a different number of basic steps when given different inputs. For example, consider the connectTo method from listing 2.3. It takes two containers as inputs:

Its only explicit input is the parameter other of type Container.
Being an instance method, it also takes this as an implicit parameter, so the current container is also an effective input.

That method contains two for loops, whose lengths (that is, number of iterations) depend on the size of the two container groups being merged, which is a function of the inputs.

In such cases, you summarize with one or more numeric parameters what it is in the input that influences the running time of your algorithm. Usually, the summary involves measuring the size of the input in some way. If the number of basic steps of our algorithm varies even for same-sized inputs, we just consider the worst case—that is, the maximum number of steps performed on any input of a given size.

Going back to the connectTo method, as a first attempt you can consider two parameters: size1 and size2, the sizes of the two groups of containers you’re merging. Using these parameters, you can analyze the connectTo method as shown in the following listing.

Listing 2.5. Reference: The `connectTo` method (comments stripped)

I’m counting as one step anything that doesn’t involve a loop, because its running time will be essentially constant, and in particular independent of the parameters size1 and size2. I’m also sweeping a lot of detail under the rug when labeling the group.addAll line with “size2 steps.” In short, that estimate is the expected number of steps assuming the hashCode method spreads objects uniformly over the whole set of representable integers.

Note

For a deeper understanding of hash tables and their performance, refer to a book on data structures, such as the one I mention in the Further reading section at the end of this chapter.

According to this reasoning, the number of basic steps that connectTo performs is

6 + 2 * size2 + (size1 + size2) = 6 + size1 + 3 * size2

(*)

However, you should recognize that the number 6 in this expression is somewhat arbitrary. If you counted assembly lines instead of Java lines, you might get 6 thousand instead of 6, and 6 million steps if you counted the steps of a Turing machine. For the same reason, the 3 multiplier in front of size2 is essentially arbitrary. In other words, the constants 3 and 6 depend on the granularity you choose for the basic steps.

A more interesting way to count steps that elegantly sidesteps the granularity issue is to focus on only how quickly the number of steps grows when the size parameters grow. This is called the order of growth, and it’s the basic tenet of complexity theory, a branch of computer science. The order of growth frees you from the burden of establishing a specific granularity for the basic steps, thus providing performance estimates that are more abstract but easier to compare with one another. At the same time, the order of growth preserves the asymptotic behavior of our function, that is, the trend for large values of its parameter(s).

In practice, the most common way to indicate the order of growth is the so-called big-O notation. For example, the basic steps expression (*) in big-O notation becomes O(size1 + size2), effectively hiding all arbitrary additive and multiplicative constants. In so doing, it highlights the fact that the number of steps is linearly proportional to size1 and size2. More precisely, the big-O notation establishes an upper bound to the growth of a function. So, O(size1 + size2) asserts that our running time grows at most linearly with respect to size1 and size2.

The connectTo method is simple, always performing the same number of steps for the same values of size1 and size2. Other algorithms are less regular in that their performance depends on some feature of the input that the size parameter(s) don’t express. For example, searching for a specific value in an unordered array may find that value immediately (constant time) or may involve scanning the whole array before realizing the value isn’t actually there (linear time). In that case, complexity analysis suggests that you consider the input that requires the most steps to complete, aka the worst case. That’s why the standard performance estimate for algorithms is called worst-case asymptotic complexity. Summarizing, the (worst-case asymptotic) complexity of searching in an unordered array is O(n). Table 2.2 presents some common big-O bounds, their names, and examples of array algorithms matching that bound. For algorithms running on arrays, the parameter n refers to the size of the array.

Table 2.2. Common complexity bounds in big-O notation

Notation	Name	Example
O(1)	Constant time	Checking whether the first element in an array is zero
O(log n)	Logarithmic time	Binary search: the smart way to look for a specific value in a sorted array
O(n)	Linear time	Finding the maximum value in an unsorted array
O(n log n)	Quasilinear time	Sorting an array using merge sort
O(n²)	Quadratic time	Sorting an array using bubble sort

Pop Quiz 4

Given an unordered array of integers, what is the complexity of checking whether the array is a palindrome?

Before we delve a little deeper into the asymptotic notation, let’s further simplify the analysis of connectTo by switching from two size parameters to a single one. If you call n the total number of containers ever created, then size1 + size2 is at most equal to n (distinct groups are disjoint by definition). Because the upper bound O(size1+size2) holds for our function, so does O(n), which is greater than the first upper bound. In other words, the time that a connectTo operation requires grows at most linearly with the total number of containers around. This may seem like a brutal approximation, and it is. After all, size1 and size2 are likely to be much smaller than n. However, rough as it is, this type of upper bound will be accurate enough to distinguish the efficiency of the various implementations presented in the following chapters.

The formal definition of big-O notation

When someone says that an algorithm has complexity O(f(n)) for some function f, they mean that f(n) is an upper bound to the number of basic steps that the algorithm performs on any input of size n. This makes sense if we agree on how to measure the size of the input with a single parameter n.

More formally, you can apply the big-O notation to any function f(n), representing the number of steps an algorithm requires when run on an input of size n. Consider an algorithm and let g(n) be the actual number of “steps”—however they may be defined—performed by the algorithm on an input of size n. Then, writing that the algorithm has time complexity O(f(n)) means that two numbers m and c exist such that, for all n ≥ m

g(n) ≤ c · f(n)

In other words, for sufficiently large inputs, the actual number of steps is at most equal to a constant times the value of the f function.

Complexity theory includes several other notations, denoting lower bounds, simultaneous lower and upper bounds, and so on.

2.3.1. Time complexity of Reference

You can now precisely state the time complexity of all the methods from Reference. The getAmount method is a simple getter and takes constant time. Methods connectTo and addWater need to cycle over all containers in a group. Because a group can be as large as the whole set of all containers, their complexity in the worst case is linear with the total number n of containers. Table 2.3 summarizes these observations. In chapter 3, you’ll learn how to improve these time complexities.

Table 2.3. Time complexities for Reference, with n standing for the total number of containers

Method	Time complexity
getAmount	O(1)
connectTo	O(n)
addWater	O(n)

2.4. Applying what you learned

This section, and the similar sections in the following chapters, applies the notions developed in the chapter to different contexts. Because the whole book is based on the idea of using a single example to tie together a variety of topics, it’s particularly important to read and work through these applications. For this reason, I’ve framed them as exercises. Naturally, you should try to solve them on your own. If you don’t have the time or the inclination to do that, at least read the exercises and their solutions. I believe you’ll find that the solutions are carefully explained and sometimes add useful insight to the core chapter contents. Besides, several exercises throughout the chapters guide you in a behind-the-scenes exploration of various classes from the JDK and other libraries.

Exercise 1

1. What is the complexity of the following method?

public static int[][] identityMatrix(int n) {
   int[][] result = new int[n][n];
   for (int i=0; i<n; i++) {
      for (int j=0; j<n; j++) {
         if (i==j) {
            result[i][j] = 1;
         }
      }
   }
   return result;
}

2. Can you make it more efficient without changing its output?

3. If you were able to come up with a more efficient version, does that version have a lower complexity?

Exercise 2

The class java.util.LinkedList<T> realizes a doubly linked list of references to objects of type T. Check out its source code^[6] and estimate the size in bytes of a LinkedList with n elements (excluding the space occupied by the n objects).

⁶
As of this writing, the source code is available at http://mng.bz/KElg.

Exercise 3 (Mini-Project)

Implement the class User, representing a person in a social network, with the following functionalities:

Each user has a name. Provide a public constructor accepting that name.
Users can befriend each other using the following method:
```
public void befriend(User other)
```
Friendships are symmetric: a.befriend(b) is equivalent to b.befriend(a).
Clients can check whether two users are direct friends or indirect friends (friends of friends) using the following two methods:
```
public boolean isDirectFriendOf(User other)
public boolean isIndirectFriendOf(User other)
```

Summary

You can visualize the structure and behavior of software using static and dynamic diagrams, such as UML object diagrams and sequence diagrams.
An empty Java object takes 12 bytes of memory because of object headers.
Asymptotic complexity measures time efficiency in a hardware-independent way.
Big-O notation is the most common way to express an asymptotic upper bound to time complexity.

Answers to quizzes and exercises

Pop Quiz 1

Let’s say that a contact comprises a name and a phone number. You usually access a contact list by name, in alphabetical order. That’s a custom order based on the content of the object, so, despite its name (“list”), a contact list is better represented by a SortedSet, whose standard implementation is the class TreeSet.

In a real phone, a contact is a much more complex entity, including many attributes and being connected with different apps. As such, it’s likely to be stored in some sort of database (for example, Android uses SQLite).

Pop Quiz 2

Here’s a class diagram representing Java classes and methods:

Pop Quiz 3

An android.graphics.Rect object occupies 12 bytes for overhead and 4*4 bytes for its four integer fields, for a total of 28 bytes. As usual in this book, this estimate ignores padding issues, which are likely to bring the actual size up to the next multiple of 8, which is 32.

Pop Quiz 4

Checking whether an array of even length n is a palindrome means checking whether a[i] equals a[n-1-i] for each i from zero to n/2. That’s n/2 iterations, whose order of growth is in O(n). (The constant factor 1 is irrelevant to the asymptotic notation.)

Exercise 1

1. The complexity of the method is O(n²), that is, quadratic.

2. Here’s a more efficient version that avoids the nested loop and the if statement:

public static int[][] identityMatrix(int n) {
   int[][] result = new int[n][n];      1 The matrix is initialized with zeros. 
   for (int i=0; i<n; i++) {
      result[i][i] = 1;
   }
   return result;
}

3. The complexity of the new version is still quadratic because of the array allocation in the second line, which implicitly initializes all n² cells to zero.

Exercise 2

Here are the relevant lines from the source code of LinkedList:

public class LinkedList<E> extends AbstractSequentialList<E> ... {
   transient int size = 0;
   transient Node<E> first;
   transient Node<E> last;

   ...
   private static class Node<E> {
      E item;
      Node<E> next;
      Node<E> prev;
      ...
   }
}

A quick check reveals superclasses AbstractSequentialList, AbstractList, and Abstract Collection, in that order. Of those, only AbstractList contains an instance field, used to detect concurrent modifications to the list during an iteration:

protected transient int modCount = 0;

That said, a LinkedList with n elements occupies

12 bytes for overhead
3*4 bytes for the three fields size, first, and last
4 bytes for the inherited modCount field

In addition, for each element it occupies

12 bytes for object overhead
3*4 bytes for the three fields item, next, and prev

Summarizing, a LinkedList with n elements occupies 28 + n * 24 bytes.

Exercise 3

The specifications are somewhat similar to the container scenario, except that you need to distinguish direct from indirect connections (aka friendships). A possible solution is to store direct connections explicitly and compute indirect connections on demand. You can then start the class as follows:

public class User {
   private String name;
   private Set<User> directFriends = new HashSet<>();

   public User(String name) {
      this.name = name;
   }

   public void befriend(User other) {
      directFriends.add(other);
      other.directFriends.add(this);
   }

   public boolean isDirectFriendOf(User other) {
      return directFriends.contains(other);
   }

Checking indirect connections requires a visit to an (undirected) graph. The simplest such algorithm is the breadth-first search (BFS), which maintains two sets of nodes:

A frontier of nodes waiting to be visited
A set of already visited (aka closed) nodes

Here’s a possible implementation of a BFS:

   public boolean isIndirectFriendOf(User other) {
      Set<User> visited = new HashSet<>();
      Deque<User> frontier = new LinkedList<>();

      frontier.add(this);
      while (!frontier.isEmpty()) {
         User user = frontier.removeFirst();
         if (user.equals(other)) {
            return true;
         }
         if (visited.add(user)) {   1 If not visited
            frontier.addAll(user.directFriends);  2 addAll inserts at the tail end. 
          }
      }
      return false;
   }

Table of Contents for Chapter 2. Reference implementation

Create new playlist

Sign In

Sign Up

Chapter 2. Reference implementation

Figure 2.1. Choosing the right Java interface and class for a collection of items

Pop Quiz 1

2.1. The code [Reference]

Listing 2.1. Reference: Fields and constructor

2.1.1. Memory layout diagrams

Figure 2.2. The situation after executing the first three parts of UseCase. You have connected together containers a through c and poured water into a and d.

Figure 2.3. The memory layout of Reference after executing the first three parts of UseCase. To avoid clutter, I’ve pictured the references from the HashSets back to the containers as reaching the name of the container.

UML Class Diagram

Figure 2.4. UML class diagram for Reference (detailed version)

Figure 2.5. UML class diagram for Reference (abstract version)

Pop Quiz 2

UML Object Diagram

Figure 2.6. UML object diagram for Reference (abstract version)

2.1.2. The methods

Listing 2.2. Reference: The getAmount method

Listing 2.3. Reference: The connectTo method

Listing 2.4. Reference: The addWater method

2.2. Memory requirements

2.2.1. Memory requirements of Reference

Figure 2.7. The detailed memory footprint of a container in version Reference. The estimates for HashSet assume a perfectly sized table of buckets and a perfect hashing function, resulting in exactly one element in each bucket.

Pop Quiz 3

Table 2.1. Memory requirements of Reference in two conventional scenarios

2.3. Time complexity

Listing 2.5. Reference: The connectTo method (comments stripped)

Note

Table 2.2. Common complexity bounds in big-O notation

Pop Quiz 4

2.3.1. Time complexity of Reference

Table 2.3. Time complexities for Reference, with n standing for the total number of containers

2.4. Applying what you learned

Exercise 1

Exercise 2

Exercise 3 (Mini-Project)

Summary

Answers to quizzes and exercises

Pop Quiz 1

Pop Quiz 2

Pop Quiz 3

Pop Quiz 4

Exercise 1

Exercise 2

Exercise 3

Further reading

Table of Contents for
Chapter 2. Reference implementation

Figure 2.2. The situation after executing the first three parts of UseCase. You have connected together containers `a` through `c` and poured water into `a` and `d`.

Figure 2.3. The memory layout of Reference after executing the first three parts of UseCase. To avoid clutter, I’ve pictured the references from the `HashSet`s back to the containers as reaching the name of the container.

Listing 2.2. Reference: The `getAmount` method

Listing 2.3. Reference: The `connectTo` method

Listing 2.4. Reference: The `addWater` method

Figure 2.7. The detailed memory footprint of a container in version Reference. The estimates for `HashSet` assume a perfectly sized table of buckets and a perfect hashing function, resulting in exactly one element in each bucket.

Listing 2.5. Reference: The `connectTo` method (comments stripped)