Chapter 5. Collections

PHP has only one built-in collection type: array. It presents an interface that is a set of ordered key/value pairs. This interface allows it to serve the purpose of several different data structures that programs in most languages typically use: vectors, sets, and maps (also known as dictionaries).

Hack has several classes that provide specialized vector, set, and map functionality. They allow for better understanding by both the Hack typechecker and human readers of code.

There are seven collection classes in Hack:

Vector

A mutable, ordered sequence of values, indexed by integers. The indices are the integers between 0 and n–1, where n is the number of elements in the vector.

Map

A mutable, ordered set of unique keys, each of which maps to a value. The keys may be integers or strings, and the values can be of any type. Unlike the map types in many other programming languages, Hack Maps remember the order in which their values were inserted. Of all the collection classes, Map is the most similar to PHP arrays.

Set

A mutable, ordered set of unique values. The values may be integers or strings.

Pair

An immutable sequence of exactly two values, indexed by the integers 0 and 1. Pairs are a detail of the API to the other collection classes, and you generally shouldn’t create them yourself; use tuples instead (see “Hack’s Type System”).

ImmVector, ImmMap, and ImmSet

Immutable versions of Vector, Map, and Set, respectively.

Vector, Map, Set, and Pair represent the overwhelming majority of use cases for PHP arrays.

In this chapter, we’ll see why and how to use Hack collection classes.

Note

I’ll be using lowercase-v “vector” and capital-V Vector distinctly in this chapter, and similarly for “map,” “set,” and “pair.” I’ll use “vector” to refer to the general concept of an ordered sequence of values, common to many programming languages. I’ll use Vector to refer specifically to the class that Hack provides.

When I use the word “array” in this chapter, it specifically means the PHP/Hack data type, used with the array keyword.

Why Use Collections?

There is a single underlying reason to use collections instead of arrays: PHP arrays are extremely flexible, but in practice, applications use them in one of a small number of highly specific patterns: vectors, maps, and sets. Using the right type of collection instead makes life easier for both humans and computers.

For human readers of code, seeing the names of specific collection classes makes it clearer what their purpose is. This advantage becomes much more potent when combined with Hack’s type annotations: the purpose of a collection is made clear at every abstraction boundary it passes through. This prevents mistakes and makes development faster and easier.

For computers, the smaller a collection’s set of functionality is, the easier it is to understand the code around it. Arrays are particularly difficult for the Hack typechecker to understand, because they can be used in such a wide variety of ways. For example, if you’re using an array as a vector and you pass it to a function that expects a map-like array, that should be a type error, but the typechecker can’t tell when this happens: it’s not possible, in general, to tell how an array is being used.

Hack is gradually adding solutions to this problem—shapes (see “Array Shapes”) are part of this effort—but collections provide immediate relief.

There can be performance benefits to using specific collection classes too. As an example, arrays generally allocate more memory than they use, so that they don’t have to allocate more memory every time a value is added. However, some arrays are never modified, so this extra capacity is wasted; there’s no way for a programmer to express that an array is immutable. Hack collection classes do have this feature.

The higher-level reason to use collections is simply that collections are more in keeping with Hack’s general pro–static typing philosophical stance. The more you can express a program’s behavior through static types, the better, for both humans and computers. Collections are a wide-ranging, high-leverage way to do so.

Collections Have Reference Semantics

If you’re writing a project in Hack from the ground up, the Hack collection classes should be your first choice when you need collection functionality, for the reasons documented previously.

If you’re working with a significant amount of preexisting PHP code, though, converting it to use collection classes instead of arrays can be quite challenging. The reason is one major semantic difference between arrays and collections: arrays have value semantics, whereas collections have reference semantics.

These two concepts are represented by the two possible answers to this question: in the following example, does the last statement print 'original' or 'new'?

$var_one = array('original');
$var_two = $var_one;
$var_two[0] = 'new';
echo $var_one[0];

The answer is 'original', which is consistent with value semantics. When you assign the array to $var_two, the array is copied,1 so modifications to $var_two are not reflected in $var_one (and vice versa).

Collections are the opposite; they have reference semantics, like all objects do in Hack and PHP. If $var_one in our example were a Vector instead, the last statement would print 'new'. The assignment $var_two = $var_one doesn’t copy the Vector, so the modification to $var_two is reflected in $var_one.

This may seem like a fairly minor difference at first, but it has far-reaching implications, and you need to be aware of it if you’re converting code that uses arrays to use collections instead. In typical code, the pseudo-copying of arrays (as in the preceding example) is ubiquitous: it happens any time you pass an array to a function, or return an array from a function.

Here’s an example of a situation in which you need to consider this difference:

function get_items(): array<string> {
  static $cache = null;
  if ($cache === null) {
    $cache = do_expensive_fetch();
  }
  return $cache;
}

function main(): void {
  $items = get_items();
  $items[] = some_special_item();

  foreach ($items as $item) {
    // ...
  }
}

main() is modifying the value returned from get_items(), which caches the result of do_expensive_fetch() in a static local variable. Because get_items() returns an array, this code is correct: main() is working on a separate copy of the array from the one stored in the static variable in get_items().

However, if this code is mechanically converted to use collections instead, so that do_expensive_fetch() and get_items() return Vector<string> instead, the code breaks. The Vector is never copied, so main()’s modification of the Vector will be visible to any other caller of get_items().

Note that this is an example of memoization; you need to be aware of this issue when using the special attribute __Memoize as well (see “Special Attributes”).

The first line of defense against this problem is immutability. get_items() should be returning an immutable Vector, capturing the contract that callers should not be modifying it. If they need to modify it, they should make a copy and modify that instead (which is what is implicitly happening in the array-based code).

This is how get_items() should be implemented using collections (we’ll see the meaning of the ConstVector type annotation in “Type Annotations for Collections”):

function get_items(): ConstVector<string> {
  static $cache = null;
  if ($cache === null) {
    $cache = do_expensive_fetch();
  }
  return $cache->immutable();
}

Every collection class has a method called immutable() that returns an immutable version of itself. This doesn’t copy the collection’s underlying storage in memory—in fact, it results in behavior very similar to PHP arrays’ copy-on-write—so it’s cheap. This way, if any caller of get_items() tries to modify the Vector it returns, an InvalidOperationException will be thrown, clearly showing you what needs to be changed.

Using Collections

With HHVM, the collection classes can be used even in regular PHP files (i.e., non-Hack files). You have to prefix their class names with the HH namespace (e.g., HHVector), whereas in Hack files the namespace isn’t necessary.

Code that uses collections looks almost identical to code that uses arrays. Collections are built into the language and runtime, so they work seamlessly with many of the language constructs you already use with arrays—we’ll look at those in this section. Each collection class also has a full-featured object-oriented interface, the most important parts of which we’ll see here and in “Type Annotations for Collections”.

Literal Syntax

Hack adds special syntax for creating instances of collection classes, called collection literal syntax. It consists of the name of the class, followed by a brace-enclosed list of items. The items are separated by commas. In Map literals, each item is the key, followed by =>, followed by the value, just as in PHP array literal syntax:

$vector = Vector {'one', 'two', 'three'};
$map = Map {'one' => 1, 'two' => 2, 'three' => 3};
$set = Set {'one', 'two', 'three'};
$pair = Pair {'one', 'two'};

Collection literal syntax is allowed in any position where regular PHP array() syntax is allowed, including in the initializer expressions of object and class properties. This is the reason why it exists: even though collection literal syntax entails object creation (which usually isn’t allowed in these positions), it is legal anywhere array() is legal. For example:

class Pluralizer {
  private static Map<string, string> $cache = Map {};
}

Collection literals in this position are not allowed to contain any expression that is itself not allowed in this position. array() syntax has the same restriction. For example, this is not valid syntax, because function calls are not valid class property initializers:

class Pluralizer {
  // Syntax error
  private static Map<string, string> $cache =
    Map {'child' => fetch_plural_from_db('child')};
}

Note that although the collection classes are generic, there are no type arguments in literal syntax:

$vec = Vector<int> {1, 2, 3};  // Syntax error

Instead, the typechecker will silently track the types of the collection’s contents, and only check for errors when you pass the collection through a type annotation (e.g., by assigning it to a property with a type annotation). See “Unresolved Types, Revisited” for full details.

Reading and Writing

The square-bracket syntax that you use with arrays is also what you use with Vector, Map, and Pair:

$vector = Vector {'zero', 'one', 'two'};
echo $vector[1];  // Prints 'one'

$map = Map {};
$map['zero'] = 0;

$pair = Pair {'first', 'second'};
echo $pair[0];  // Prints 'first'

If you try to read an element that doesn’t exist, or to set an element in a Vector that is beyond the Vector’s bounds, an OutOfBoundsException will be thrown. Accessing elements by reference (as in $ref = &$array[0] in regular PHP) is not allowed with collections; doing so results in a fatal error.

You can’t use this syntax to modify Sets. You can use it to read from Sets, but you shouldn’t. The most common operation on a Set is to test whether a value is in it, and the square-bracket syntax is unsuitable for that: if the value is not in the Set, it will throw an OutOfBoundsException. For membership testing, use the contains() method (see “Type Annotations for Collections”) instead:

if ($the_list->contains($user_id)) {
  echo "You're on the list";
}

Arrays have a quirky behavior wherein keys that are strings containing the representation of an integer2 are treated as the integer instead. For example:

$array = array('3' => 'three');
echo $array[3];  // Prints 'three'

$array = array(3 => 'three');
echo $array['3'];  // Prints 'three'

Hack collections do not do this. Map and Set treat the string "3" and the integer 3 as distinct keys, and if you use anything other than an integer to index into a Vector or Pair, an InvalidArgumentException will be thrown.

To test whether a key exists in a Map or an element exists in a Set, you can use the containsKey() and contains() methods, respectively:

$map = Map {'one' => 'un', 'two' => 'deux'};
if ($map->containsKey('two')) {
  echo "We know how to say 'two' in French!";
}

$set = Set {'one', 'two'};
if ($set->contains('one')) {
  echo "'one' is in the set";
}

You can also use isset and empty to test if a key or element exists, but you should always use containsKey() or contains() if possible. isset and empty aren’t allowed in Hack strict mode—see “isset, empty, and unset” for the reasons why. The only reason you may want to use them on collections is so that you can write code that accepts both arrays and collections seamlessly.

Like empty arrays, empty collections of any type evaluate to false when converted to bool. In particular, they’re treated as false in conditional statements like if and while, and in ternary expressions:

$vector = Vector {};
if ($vector) {
  // Code in here will not be executed
}

$description = ($vector ? (string)$vector : '[none]');

Iterating

You can iterate over collections with foreach:

$vector = Vector {'zero', 'one'};
foreach ($vector as $value) {
  echo $value;
}

$map = Map {'one' => 'un', 'two' => 'deux'};
foreach ($map as $eng => $fr) {
  echo $eng . ' in French is ' . $fr;
}

Adding or removing an item in a collection while iterating over it with foreach is not allowed; doing that will result in an InvalidOperationException being thrown.

foreach by reference, as in foreach ($vector as &$value), is also not allowed; doing that will result in a fatal error. You can approximate this behavior by adding the key or index as an iteration variable, as in foreach ($vector as $index => $value), and modifying the value that way:

// Old code with array
$array = array(0, 1, 2);
foreach ($array as &$value) {
  $value *= 10;
}

// Equivalent code with Vector
$vector = Vector {0, 1, 2};
foreach ($vector as $index => $value) {
  $vector[$index] = $value * 10;
}

Adding values

You can append values to Vectors, and add them to Sets, with the normal empty-square-bracket syntax. In the case of Sets, if the value already exists in the Set, there’s no effect:

$vector = Vector {'zero'};
$vector[] = 'one';
print_r($vector);  // Prints: "HHVector Object( [0] => zero, [1] => one )"

$set = Set {'eins'};
$set[] = 'eins';  // Value is already in $set; nothing happens
print_r($set);    // Prints: "HHSet Object( eins )"

The same syntax works with Maps, but because you have to specify both a key and a value, the righthand side of the expression must be a Pair of key and value:

$map = Map {};
$map[] = Pair {'one', 'eins'};
print_r($map)  // Prints: "HHMap Object( [one] => eins )"

You can also use the add() method on Vectors and Sets, passing the value to be added as the only argument. Map has the add() method, too; pass it a Pair of key and value.

Deleting values

To delete a value from a Vector, use the removeKey() method:

$vec = Vector {'first', 'second', 'third'};
$vec->removeKey(1);
print_r($vec);  // Prints: "HHVector Object( [0] => first, [1] => third )"

Note that the elements that are after the removed one are all shifted down by one index, so that the index 1 now holds the value 'third'. This is in line with vector semantics, which state that all indices between 0 and n–1, inclusive, are valid (where n is the number of elements in the vector).

The method to remove a key from a Map is also called removeKey(). To remove a value from a Set, use the method remove().

You can also delete items from Maps and Sets using the unset statement:

$map = Map {'one' => 'un', 'two' => 'deux'};
unset($map['one']);
print_r($map);  // Prints: "HHMap Object( [two] => deux )"

However, again, you should generally use the methods instead, as unset isn’t allowed—for good reason—in strict mode. You can use unset if you need to write code that accepts both arrays and collections seamlessly.

unset does not work with Vectors. This is because the semantics of removing elements from Vectors don’t match the semantics of removing elements from arrays. Unsetting an element of an array (even one that’s being used like a vector) leaves a “hole,” where the array’s valid indices are not contiguous, thus breaking vector semantics:

$arr = array('zero', 'one', 'two');
unset($arr[1]);
print_r($arr);  // Prints: "Array( [0] => zero, [2] => two )"

Operators

Collections can be compared for equality with the == operator. This is how it works:

  1. If the two sides are not the same kind of collection (disregarding mutability), the result is false. For example, a Vector may compare equal to an ImmVector, but it will never compare equal to a Map.

  2. If the two sides are Vectors or ImmVectors, the result is true if and only if both sides contain the same number of values, and the values at each index compare equal using ==. For example:

    $vector = Vector {1, 2};
    $immvector = ImmVector {1, 2};
    $strings = Vector {'1', '2'};
    $wrong_order = Vector {2, 1};
    
    var_dump($vector == $immvector);    // true
    var_dump($vector == $strings);      // true, because 1 == '1', 2 == '2'
    var_dump($vector == $wrong_order);  // false
  3. If the two sides are Pairs, the result is true if and only if the values at each index compare equal using ==.

  4. If the two sides are Sets or ImmSets, the result is true if and only if both sides contain the same number of values, and every element in one side exists in the other side. Unlike with Vectors, these existence tests are done with === identity comparison. Order is irrelevant. For example:

    $set = Set {1, 2};
    $immset = ImmSet {1, 2};
    $strings = Set {'1', '2'};
    $wrong_order = Set {2, 1};
    
    var_dump($set == $immset);       // true
    var_dump($set == $strings);      // false
    var_dump($set == $wrong_order);  // true
  5. If the two sides are Maps or ImmMaps, the result is true if and only if both sides contain the same number of keys, every key in one side exists in the other side (using identity comparison), and identical keys map to equal values (using == comparison). Order is irrelevant. For example:

    $map = Map {10 => 20, 20 => 40};
    $string_keys = Map {'10' => 20, '20' => 40};
    $string_values = Map {10 => '20', 20 => '40'};
    
    var_dump($map == $string_keys);    // false
    var_dump($map == $string_values);  // true

Collections can be compared for identity with the === operator. This only evaluates to true if both sides of the operator are the same object. If they are distinct objects, === comparison will evaluate to false even if the two objects have the same contents:

$vector = Vector {1, 2};
$another_variable = $vector;
var_dump($vector === $another_variable);  // true

$other = Vector {1, 2};
var_dump($vector === $other);  // false

List assignment with a collection on the righthand side works just as if the collection were an array. List assignment is shorthand for indexing into the array or collection on the righthand side with integer keys, so this is the behavior for Maps and Sets (the internal ordering of the Map or Set doesn’t matter):

$vector = Vector {'one', 'two'};
list($one, $two) = $vector;

$map = Map {1 => 'one', 0 => 'zero'};
list($zero, $one) = $map;  // $zero is 'zero' and $one is 'one'

Immutable collections

Vector, Map, and Set have immutable equivalents: ImmVector, ImmMap, and ImmSet, respectively. (Pair is immutable and has no mutable equivalent.) They don’t implement any methods that modify their contents, and they can’t be modified through square-bracket syntax or unset; if you try to do so, an InvalidOperationException will be thrown. The contents of immutable collections are fixed when they’re created. They can be created with literal syntax—just use ImmVector, ImmMap, or ImmSet as the class name—or through their constructors or conversion from another collection (see “Concrete Collection Classes”).

You should generally use immutable collections whenever possible. If some data isn’t supposed to change, enforcing that contract closes off a possible source of bugs. It also encodes more information about the program’s behavior in the type system, which is always a good thing.

Type Annotations for Collections

Most of the time, you shouldn’t use the collection class names themselves in type annotations. Hack provides a large set of interfaces that describe elements of a collection’s functionality, and you should generally use those in type annotations.

For example, if you’re writing a function that takes a set of values as an argument and doesn’t modify it, you should annotate the argument as ConstSet, an interface, rather than Set, the concrete class. This increases expressiveness, which helps the typechecker catch more mistakes: if you try to modify the set within the function, there will be a type error. It also makes the function’s contract clear to callers: it wants a set, and it won’t modify it.

In this section, we’ll see the interfaces that you’re most likely to use. This will double as a natural way to present the object-oriented interfaces to the collection classes. If you just want to see the collection class APIs all in one, skip to “Concrete Collection Classes”; that section doesn’t have explanations for the methods, but many of them are self-explanatory, especially with type annotations.

Core Interfaces

The core collection interfaces are:

Traversable<T>

Anything that can be iterated over using foreach without a key is Traversable. Within such a foreach, the iteration variable will have type T. This is the only thing Traversable guarantees; it does not declare any methods.

The most important thing about Traversable is that regular PHP arrays are Traversable. This is unusual, because arrays are not objects and, in general, only objects can implement interfaces. Traversable is special-cased in the runtime to have this behavior.

In addition to arrays and collections, Traversable includes objects that implement Iterator.

Traversable can help bridge the gap between arrays and collections. If the only thing you do with a function argument is iterate over it using foreach without a key, irrespective of whether it’s an array, a collection, or something else, you should annotate it as Traversable.

Note that if you’re implementing your own class that you want to be usable with foreach, you should not make it implement Traversable. Use Iterable (described shortly) instead.

KeyedTraversable<Tk, Tv> extends Traversable<Tv>

KeyedTraversable is similar to Traversable, but additionally indicates that it’s valid to include a key in the foreach statement. Regular PHP arrays are KeyedTraversable. The following example shows the difference between Traversable and KeyedTraversable:

function notKeyed(Traversable<T> $traversable): void {
  // Not valid
  foreach ($traversable as $key => $value) {
    // ...
  }
}

function keyed(KeyedTraversable<Tk, Tv> $traversable): void {
  // Valid
  foreach ($traversable as $key => $value) {
    // $key is of type Tk
    // $value is of type Tv
  }
}
Container<T> extends Traversable<T>

Container is exactly like Traversable, except that it does not include objects that implement Iterator. In other words, it includes only arrays and instances of collection classes. The only thing you can do with a Container is to iterate over it with foreach.

KeyedContainer<Tk, Tv> extends KeyedTraversable<Tk, Tv>

Similarly, KeyedContainer is like KeyedTraversable, except that it is restricted to arrays and collection classes other than Set and ImmSet.

Indexish<Tk, Tv> extends KeyedTraversable<Tk, Tv>

Indexish signifies anything that can be indexed into using square-bracket syntax: $indexish[$key]. It declares no methods. Like Traversable and KeyedTraversable, it is a special interface that is “implemented” by arrays as well as collections and other objects that support this syntax.

IteratorAggregate<T> extends Traversable<T>

This interface is for objects that can produce an Iterator object to iterate over their contents. Unlike the previous three interfaces, it is not implemented by arrays. It’s very unlikely that you’ll ever use IteratorAggregate in type annotations—either Iterable or Traversable is probably more appropriate. The interface declares a single method:

  • getIterator(): Iterator<T> returns an iterator over the object’s contents. The Iterator interface is the one from standard PHP.

Iterable<T> extends IteratorAggregate<T>

This is where the real capabilities of collections begin to come in. The Iterable interface declares several methods:

  • toArray(): array converts the collection to an array. Note that the return value does not have a type argument: it’s simply array instead of array<T>.

  • toValuesArray(): array converts the collection to an array but discards the keys, replacing them with the integers 0 to n–1, in order.

  • toVector(): Vector<T> converts the collection to a Vector. This is very similar to toValuesArray(); if the collection has keys (i.e., is a Map), the keys will be discarded.

  • toImmVector(): ImmVector<T>: converts to an immutable Vector.

  • toSet(): Set<T> converts the collection to a Set, discarding the keys, if any.

  • toImmSet(): ImmSet<T> converts to an immutable Set.

  • values(): Iterable<T> returns an Iterable object yielding the collection’s values (discarding keys).

  • map<Tm>(function(T): Tm $callback): Iterable<Tm> returns an Iterable object yielding the collection’s values after they have been passed through the given function. It is much like the standard PHP array_map() function. Here’s an example that multiplies the elements of a Vector by 10:

    $nums = Vector {1, 2, 3};
    print_r($nums->map(function($x) { return $x * 10; }));
    HHVector Object
    (
        [0] => 10
        [1] => 20
        [2] => 30
    )
  • filter(function(T): bool $callback): Iterable<T> returns an Iterable object yielding the values from the collection that make the given function return true. Here’s an example of picking out even numbers from a Vector:

    $nums = Vector {1, 2, 3, 4};
    print_r($nums->filter(function($x) { return $x % 2 === 0; }));
    HHVector Object
    (
        [0] => 2
        [1] => 4
    )
  • zip<Tz>(Traversable<Tz> $traversable): Iterable<Pair<T, Tz>> returns an Iterable object that pairs up the values from this collection and the values from the passed-in Traversable. An example is the best way to explain it:

    $english = Vector {'one', 'two', 'three'};
    $french = Vector {'un', 'deux', 'trois'};
    print_r($english->zip($french));

    This will output:

    HHVector Object
    (
        [0] => HHPair Object
            (
                [0] => one
                [1] => un
            )
    
        [1] => HHPair Object
            (
                [0] => two
                [1] => deux
            )
    
        [2] => HHPair Object
            (
                [0] => three
                [1] => trois
            )
    
    )

    If the two collections have different counts, the resulting Iterable will have the smaller count.

KeyedIterable<Tk, Tv> extends Iterable<Tv>

This is analogous to Iterable, but with the key’s type included. It adds some new methods and overrides some from Iterable with different return types. The new methods are listed first:

  • toKeysArray(): array returns an array of the Iterable’s keys.

  • toMap(): Map<Tk, Tv> returns the Iterable converted to a Map.

  • keys(): Iterable<Tk>* returns an Iterable over this Iterable’s keys.

  • mapWithKey<Tm>(function(Tk, Tv): Tm $callback): KeyedIterable<Tk, Tm> is like map() but passes keys to the callback function as well as values.

  • filterWithKey(function(Tk, Tv): bool $callback): KeyedIterable<Tk, Tv> is like filter() but passes keys to the callback function as well as values.

  • getIterator(): KeyedIterator<Tk, Tv> is an override with a more specific return type.

  • map<Tm>(function(T): Tm $callback): KeyedIterable<Tk, Tu> is an override with a more specific return type.

  • filter(function(T): bool $callback): Iterable<T> is an override with a more specific return type.

  • zip<Tz>(Traversable<Tz> $traversable): Iterable<Pair<T, Tz>> is an override with a more specific return type.

General Collection Interfaces

There are three core interfaces that declare the most basic collection functionality. You’ll essentially never use these in type annotations, as they’re too nonspecific to be useful that way, but we’ll look at them here to learn these core functions:

ConstCollection<T>

A read-only collection of values of type T. It says nothing about uniqueness of values, ordering, underlying implementation, or anything.

Every concrete collection class implements this interface (indirectly). It may seem unsuitable for Map, because it only has one type parameter and Map needs two (one for keys and one for values), but Maps do implement ConstCollection: a Map with key type Tk and value type Tv implements ConstCollection<Pair<Tk, Tv>>.

This interface declares three methods:

  • count(): int returns the number of values in the collection.

  • isEmpty(): bool returns whether the collection is empty.

  • items(): Iterable<T> returns a value that can be iterated over using foreach, and will yield every value in the collection.

OutputCollection<T>

This interface declares two methods that allow adding values to the collection (every mutable collection class implements this):

  • add(T $value): this adds the given value to the collection and returns the collection itself.

  • addAll(?Traversable<T> $values): this iterates over the given Traversable and adds each resulting value to the collection. It returns the collection itself.

Collection<T> extends ConstCollection<T>, OutputCollection<T>

This interface declares no methods; it just serves to combine the read-only behavior of ConstCollection and the write-only behavior of OutputCollection.

Specific Collection Interfaces

Now, at last, we’ll get into specific collection functionality. We’ll look at six collection interfaces and the methods they declare.3 They’re meant to describe functionality independent of implementation. For now, there’s only one concrete implementation of each, but there may be others in the future—for example, one can imagine a linked list–based class that implements MutableVector.

All of these interfaces either directly or indirectly extend KeyedIterable, which declares several methods with KeyedIterable as their return type, such as map() and filter(). All of these interfaces override such methods with specific return types—for example, ConstVector<T> declares filter(function(T): bool $callback): ConstVector<T>. These overridden methods are omitted in the following list:

ConstSet<T> extends ConstCollection<T>, KeyedIterable<mixed, T>

This represents a read-only set of values of type T.4 It declares only one method directly:

  • contains(T $value): bool returns whether the given value is in the set. The semantics are the same as === comparison: the result is true if and only if there is a value `in the set that compares identical to $value using ===.

MutableSet<T> extends ConstSet<T>, Collection<T>

This represents a modifiable set of values of type T. It extends ConstSet and declares two methods directly:

  • clear(): this removes all values from the set, and returns the set.

  • remove(T $value): this removes the given value from the set (doing nothing if the value is not in the set), and returns the set. As with contains(), the semantics are the same as === comparison.

ConstVector<T> extends ConstCollection<T>, KeyedIterable<int, T>

This represents a read-only sequence of values of type T, indexed by integers. It declares three methods directly:

  • at(int $index): T returns the value at the given index, or throws an exception if the index is out of bounds.

  • containsKey(int $index): bool returns whether the given index is in bounds.

  • get(int $index): ?T returns the value at the given index, or null if the index is out of bounds.

MutableVector<T> extends ConstVector<T>, Collection<T>

This represents a modifiable sequence of values of type T. It extends ConstVector and adds these methods:

  • clear(): this removes all values from the vector.

  • removeKey(int $index): this removes the value at the given index. In line with vector semantics, the values at higher indices will all be shifted down by one, so that the indices remain contiguous.

  • set(int $index, T value): this sets the given value at the given index, throwing an exception if the index is out of bounds. If you want to extend the vector, use add().

  • setAll(KeyedTraversable<int, T> $kt): this iterates over the given KeyedTraversable and calls set() with each key/value pair in it.

ConstMap<Tk, Tv> extends ConstCollection<Pair<Tk, Tv>>, KeyedIterable<Tk, Tv>

This represents a read-only mapping of keys of type Tk to values of type Tv. It declares methods that resemble those of ConstSet and ConstVector:

  • at(Tk $key): Tv returns the value for the given key, or throws an exception if the key isn’t in the map.

  • contains(Tk $key): bool returns whether the given key exists in the map.

  • containsKey(Tk $key): bool is the same as contains(). The duplication of methods is just a quirk of the inheritance hierarchy of these interfaces.

  • get(Tk $key): ?Tv returns the value for the given key, or null if the key isn’t in the map.

MutableMap<Tk, Tv> extends ConstMap<Tk, Tv>

This represents a modifiable mapping of keys to values. Again, the methods that it declares are a combination of the methods from MutableVector and MutableSet:

  • clear(): this removes all keys and values from the map.

  • remove(Tk $key): this removes the value at the given key.

  • removeKey(Tk $key): this is exactly the same as remove().

  • set(Tk $key, Tv $value): this sets the given value at the given key.

  • setAll(KeyedTraversable<Tk, Tv> $kt): this iterates over the given KeyedTraversable and calls set() with each key/value pair in it.

Concrete Collection Classes

Finally, to bring all this together, we’ll look at the full type-annotated APIs to all the collection classes. Each one implements one of the six interfaces from the previous section, and adds a few more useful methods.

Only methods defined by the classes themselves, and not declared by any of the interfaces we just saw, are listed here:

ImmVector<T> implements ConstVector<T>
  • __construct(?Traversable<T> $values) creates a new ImmVector with the contents of the given Traversable.

  • linearSearch(T $value): int performs a linear search for the given value within the ImmVector and returns the index at which the value was found, or -1 if it wasn’t found.

  • __toString(): string just returns "ImmVector".

Vector<T> implements MutableVector<T>
  • __construct(?Traversable<T> $values) creates a new Vector with the contents of the given Traversable.

  • linearSearch(T $value): int performs a linear search for the given value within the Vector and returns the index at which the value was found, or -1 if it wasn’t found.

  • pop(): T removes the last value from the Vector and returns it.

  • reserve(int $size): void hints to the Vector that it should reallocate memory to hold the given number of values. The Vector may not do exactly that; this is just a hint.

  • resize(int $size, T $value): void changes the size of the Vector to the passed size. If the new size is smaller than the current size, values at the end of the Vector are removed. If the new size is larger, the new values are set to $value.

  • reverse(): void reverses the Vector in place.

  • shuffle(): void randomly rearranges the values in the Vector.

  • splice(int $offset, ?int $len = NULL): void removes $len values from the Vector, starting at $offset. If $len is not passed, it removes every value from $offset to the end of the Vector. This is similar to the built-in function array_splice().

  • __toString(): string just returns "Vector".

ImmSet<T> implements ConstSet<T>
  • __construct(?Traversable<T> $values) creates a new ImmSet with the contents of the given Traversable.

  • fromArrays(...): ImmSet<T> is a static method that takes a variable number of arguments, which must all be arrays, and creates an ImmSet from all their contents.

  • fromItems(?Traversable<T> $items): ImmSet<T> is a static method that creates an ImmSet from the given Traversable.

  • __toString(): string just returns "ImmSet".

Set<T> implements MutableSet<T>
  • __construct(?Traversable<T> $values) creates a new ImmSet with the contents of the given Traversable.

  • fromArrays(...): Set<T> is a static method that takes a variable number of arguments, which must all be arrays, and creates an ImmSet from all their contents.

  • fromItems(?Traversable<T> $items): Set<T> is a static method that creates an ImmSet from the given Traversable.

  • removeAll(?Traversable<T> $values): Set<T> removes all the values in the given Traversable from the set, and returns the set itself.

  • __toString(): string just returns "Set".

ImmMap<Tk, Tv> implements ConstMap<Tk, Tv>
  • __construct(?KeyedTraversable<Tk, Tv> $values) creates a new ImmMap with the contents of the given Traversable.

  • fromItems(?Traversable<Pair<Tk, Tv>> $items): ImmMap<T> is a static method that creates an ImmMap from the given Traversable.

  • __toString(): string just returns "ImmMap".

Map<Tk, Tv> implements MutableMap<Tk, Tv>
  • __construct(?KeyedTraversable<Tk, Tv> $values) creates a new ImmMap with the contents of the given Traversable.

  • fromItems(?Traversable<Pair<Tk, Tv>> $items): ImmMap<T> is a static method that creates an ImmMap from the given Traversable.

  • __toString(): string just returns "Map".

Interoperating with Arrays

Like other Hack features, collections were designed with interoperability in mind. A codebase can be gradually converted from using arrays to using collections.

Conversion to Arrays

All Hack collections can be converted to arrays with a cast expression, or with the toArray() method:

$vector = Vector {'first', 'second'};
print_r((array)$vector);      // Prints: Array( [0] => first, [1] => second )
print_r($vector->toArray());  // Same

The conversions are straightforward:

  • Vectors and ImmVectors convert to arrays where the keys are the integer indices of the values, in the same order.

  • Maps and ImmMaps convert to arrays with the same key/value pairs, in the same order.

  • Sets and ImmSets convert to arrays with each key mapping to itself, in the same order.

  • Pairs convert to arrays with the keys 0 and 1 (integers) in that order, mapping to the corresponding values.

There is a small wrinkle in the case of integer-like string keys (see “Reading and Writing”) in Maps and Sets. If the Map or Set contains keys that conflict with each other in this way, an E_WARNING-level error will be raised. The conflicting keys will reduce to one integer key in the resulting array, and it will map to the last value under the conflicting keys:

<?php
$map = Map {10 => 'int', '10' => 'string'};
$array = (array)$map;
// Warning: Map::toArray() for a map containing both int(10) and string('10')
var_dump($array);  // Prints: array(1) { [10]=> string(6) "string" }

$set = Set {10, "10"}
$array = (array)$set;
// Warning: Set::toArray() for a map containing both int(10) and string('10')
var_dump($array);  // Prints: array(1) { [10]=> string(2) "10" }

Use with Built-In and User Functions

Hack has a lot of built-in functions that can take arrays as arguments. There are several different ways in which these have been adapted to work with collections.

The sort built-ins

Hack has a wide variety of functions that are used to sort arrays. All of these have been adapted to work with collections as well, but each one only works with certain types of collections.

  • Vectors only work with sort(), rsort(), and usort(). All the other sorting functions are concerned with keys, which doesn’t make sense for a Vector.

  • Maps and Sets only work with asort(), arsort(), ksort(), krsort(), usort(), uasort(), uksort(), natsort(), and natcasesort(). Note that for Sets, sorting by key is the same as sorting by value.

  • Immutable collections and Pairs aren’t supported because they’re immutable, and these functions sort in place. Make a mutable copy of the collection and sort that instead.

Other built-ins

The remaining built-ins that deal with arrays take a variety of approaches. There are a few specific kinds to look at first:

  • Four built-ins that modify arrays have been adapted to work with collections:

    • array_pop()

    • array_push()

    • array_shift()

    • array_unshift()

    The rest have not. Note that array_push() and array_unshift() support only Vector and Set.

  • Built-ins that read or modify arrays’ internal pointers, such as current() and reset(), don’t work with collections at all, because collections don’t have an equivalent of Hack arrays’ internal pointers.

  • Debugging and introspection functions produce output for collections similar to what they produce for arrays. For example, this:

    var_dump(array(10, 20));
    var_dump(Vector {10, 20});

    produces:

    array(2) {
      [0]=>
      int(10)
      [1]=>
      int(20)
    }
    object(HHVector)#1 (2) {
      [0]=>
      int(10)
      [1]=>
      int(20)
    }

    The functions are:

    • debug_zval_dump()

    • print_r()

    • var_dump()

    • var_export()

  • serialize() can serialize collections, but the resulting serialized string can only be unserialized by HHVM. (Collections aren’t serialized the same way as other objects.)

The most common case among the remaining built-ins is that they have a parameter that must be an array and is not by-reference. Examples of this include count() and array_diff(). In cases like this, if you pass a collection as that parameter, it will be automatically converted to an array,5 with no warning or error.

The last, and trickiest, category of built-ins consists of the ones that adapt their behavior based on the types of the arguments they’re passed. apc_store() is an example: if the first argument is a string, a single value is stored in the Alternative PHP Cache (APC); but if it’s an array, all the key/value mappings in the array are stored in APC. In general, built-ins like these do not support collections. The lone exception in HHVM 3.6 is implode().

Non-built-in functions

Non-built-in functions with an array typehint will implicitly convert passed-in collections to arrays, but there will be an E_NOTICE-level error when doing so. The rationale for this behavior is that this code is likely under your control, so you can modify it to have a collection typehint, or Indexish, or Traversable, or whatever is appropriate. However, it may not be under your control (e.g., it could be in a third-party library), so making this a hard error like a fatal or an exception is too strict. For example, this code:

function examine(array $items) {
  if (is_array($items)) {
    echo "It's an array!";
  }
}

examine(Vector {1, 2, 3});

produces the following output:

Notice: Argument 1 to examine() must be of type array, HHVector given;
argument 1 was implicitly cast to array
It's an array!

By contrast, if you pass an array to a user function that expects a collection, no implicit conversion will happen, and the typehint will fail.

1 It is not actually copied in memory at that point, either in standard PHP or in HHVM; instead, it is only copied when it is modified. This is called copy-on-write. You may have heard statements like “PHP arrays are copy-on-write,” which is true but describes implementation rather than semantics. Well, sort of. Copy-on-write should be an implementation detail—it behaves as if the array were copied at the time of the assignment—but it’s not quite. There are some obscure corner cases where the copy-on-write is detectable, although those cases are arguably bugs in the language.

2 This is actually not the same logic as is used when converting strings to integers. The string must be the decimal representation of an integer between –263 and 263 – 1 inclusive, with no leading or trailing whitespace or leading zeros. This “feature” is very bad for performance: on every array lookup, which is one of the most common operations in any PHP or Hack program, the key has to be checked for these conditions. There are some possible micro-optimizations, but it still incurs a noticeable performance cost.

3 This section is not telling the whole story. There are actually six other interfaces in the picture, called SetAccess, ConstSetAccess, and similar. I’m not going into all the details of those because they’re not used in type annotations and aren’t essential to using collections.

4 You may wonder why this interface extends KeyedIterable<mixed, T> instead of KeyedIterable<T, T>. The reason is a subtle problem with the type of map(). KeyedIterable<T, T> would declare a map<Tm>() function that returned KeyedIterable<T, Tm>. Then, ConstSet<T> would override it with a version that returned ConstSet<Tm>. The problem is that these are not compatible: in KeyedIterable<T, Tm>, the key and value types may be different, but in ConstSet<Tm>, they cannot be different. Making the key type mixed is slightly inelegant, and this may change in the future with additional typechecker functionality.

5 For efficiency, some of these built-ins have been adapted to use the collection directly, without converting it to an array, but the effect is exactly the same.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.18.83