PHP has only one built-in collection type: array
. It presents an interface that is a set of ordered key/value pairs. This interface allows it to serve the purpose of several different data structures that programs in most languages typically use: vectors, sets, and maps (also known as dictionaries).
Hack has several classes that provide specialized vector, set, and map functionality. They allow for better understanding by both the Hack typechecker and human readers of code.
There are seven collection classes in Hack:
Vector
A mutable, ordered sequence of values, indexed by integers. The indices are the integers between 0 and n–1, where n is the number of elements in the vector.
Map
A mutable, ordered set of unique keys, each of which maps to a value. The keys may be integers or strings, and the values can be of any type. Unlike the map types in many other programming languages, Hack Map
s remember the order in which their values were inserted. Of all the collection classes, Map
is the most similar to PHP arrays.
Set
A mutable, ordered set of unique values. The values may be integers or strings.
Pair
An immutable sequence of exactly two values, indexed by the integers 0 and 1. Pair
s are a detail of the API to the other collection classes, and you generally shouldn’t create them yourself; use tuples instead (see “Hack’s Type System”).
ImmVector
, ImmMap
, and ImmSet
Immutable versions of Vector
, Map
, and Set
, respectively.
Vector
, Map
, Set
, and Pair
represent the overwhelming majority of use cases for PHP arrays.
In this chapter, we’ll see why and how to use Hack collection classes.
I’ll be using lowercase-v “vector” and capital-V Vector
distinctly in this chapter, and similarly for “map,” “set,” and “pair.” I’ll use “vector” to refer to the general concept of an ordered sequence of values, common to many programming languages. I’ll use Vector
to refer specifically to the class that Hack provides.
When I use the word “array” in this chapter, it specifically means the PHP/Hack data type, used with the array
keyword.
There is a single underlying reason to use collections instead of arrays: PHP arrays are extremely flexible, but in practice, applications use them in one of a small number of highly specific patterns: vectors, maps, and sets. Using the right type of collection instead makes life easier for both humans and computers.
For human readers of code, seeing the names of specific collection classes makes it clearer what their purpose is. This advantage becomes much more potent when combined with Hack’s type annotations: the purpose of a collection is made clear at every abstraction boundary it passes through. This prevents mistakes and makes development faster and easier.
For computers, the smaller a collection’s set of functionality is, the easier it is to understand the code around it. Arrays are particularly difficult for the Hack typechecker to understand, because they can be used in such a wide variety of ways. For example, if you’re using an array as a vector and you pass it to a function that expects a map-like array, that should be a type error, but the typechecker can’t tell when this happens: it’s not possible, in general, to tell how an array is being used.
Hack is gradually adding solutions to this problem—shapes (see “Array Shapes”) are part of this effort—but collections provide immediate relief.
There can be performance benefits to using specific collection classes too. As an example, arrays generally allocate more memory than they use, so that they don’t have to allocate more memory every time a value is added. However, some arrays are never modified, so this extra capacity is wasted; there’s no way for a programmer to express that an array is immutable. Hack collection classes do have this feature.
The higher-level reason to use collections is simply that collections are more in keeping with Hack’s general pro–static typing philosophical stance. The more you can express a program’s behavior through static types, the better, for both humans and computers. Collections are a wide-ranging, high-leverage way to do so.
If you’re writing a project in Hack from the ground up, the Hack collection classes should be your first choice when you need collection functionality, for the reasons documented previously.
If you’re working with a significant amount of preexisting PHP code, though, converting it to use collection classes instead of arrays can be quite challenging. The reason is one major semantic difference between arrays and collections: arrays have value semantics, whereas collections have reference semantics.
These two concepts are represented by the two possible answers to this question: in the following example, does the last statement print 'original'
or 'new'
?
$var_one
=
array
(
'original'
);
$var_two
=
$var_one
;
$var_two
[
0
]
=
'new'
;
echo
$var_one
[
0
];
The answer is 'original'
, which is consistent with value semantics. When you assign the array to $var_two
, the array is copied,1 so modifications to $var_two
are not reflected in $var_one
(and vice versa).
Collections are the opposite; they have reference semantics, like all objects do in Hack and PHP. If $var_one
in our example were a Vector
instead, the last statement would print 'new'
. The assignment $var_two = $var_one
doesn’t copy the Vector
, so the modification to $var_two
is reflected in $var_one
.
This may seem like a fairly minor difference at first, but it has far-reaching implications, and you need to be aware of it if you’re converting code that uses arrays to use collections instead. In typical code, the pseudo-copying of arrays (as in the preceding example) is ubiquitous: it happens any time you pass an array to a function, or return an array from a function.
Here’s an example of a situation in which you need to consider this difference:
function
get_items
()
:
array
<
string
>
{
static
$cache
=
null
;
if
(
$cache
===
null
)
{
$cache
=
do_expensive_fetch
();
}
return
$cache
;
}
function
main
()
:
void
{
$items
=
get_items
();
$items
[]
=
some_special_item
();
foreach
(
$items
as
$item
)
{
// ...
}
}
main()
is modifying the value returned from get_items()
, which caches the result of do_expensive_fetch()
in a static local variable. Because get_items()
returns an array, this code is correct: main()
is working on a separate copy of the array from the one stored in the static variable in get_items()
.
However, if this code is mechanically converted to use collections instead, so that do_expensive_fetch()
and get_items()
return Vector<string>
instead, the code breaks. The Vector
is never copied, so main()
’s modification of the Vector
will be visible to any other caller of get_items()
.
Note that this is an example of memoization; you need to be aware of this issue when using the special attribute __Memoize
as well (see “Special Attributes”).
The first line of defense against this problem is immutability. get_items()
should be returning an immutable Vector
, capturing the contract that callers should not be modifying it. If they need to modify it, they should make a copy and modify that instead (which is what is implicitly happening in the array-based code).
This is how get_items()
should be implemented using collections (we’ll see the meaning of the ConstVector
type annotation in “Type Annotations for Collections”):
function
get_items
()
:
ConstVector
<
string
>
{
static
$cache
=
null
;
if
(
$cache
===
null
)
{
$cache
=
do_expensive_fetch
();
}
return
$cache
->
immutable
();
}
Every collection class has a method called immutable()
that returns an immutable version of itself. This doesn’t copy the collection’s underlying storage in memory—in fact, it results in behavior very similar to PHP arrays’ copy-on-write—so it’s cheap. This way, if any caller of get_items()
tries to modify the Vector
it returns, an InvalidOperationException
will be thrown, clearly showing you what needs to be changed.
With HHVM, the collection classes can be used even in regular PHP files (i.e., non-Hack files). You have to prefix their class names with the HH
namespace (e.g., HHVector
), whereas in Hack files the namespace isn’t necessary.
Code that uses collections looks almost identical to code that uses arrays. Collections are built into the language and runtime, so they work seamlessly with many of the language constructs you already use with arrays—we’ll look at those in this section. Each collection class also has a full-featured object-oriented interface, the most important parts of which we’ll see here and in “Type Annotations for Collections”.
Hack adds special syntax for creating instances of collection classes, called collection literal syntax. It consists of the name of the class, followed by a brace-enclosed list of items. The items are separated by commas. In Map
literals, each item is the key, followed by =>
, followed by the value, just as in PHP array literal syntax:
$vector
=
Vector
{
'one'
,
'two'
,
'three'
};
$map
=
Map
{
'one'
=>
1
,
'two'
=>
2
,
'three'
=>
3
};
$set
=
Set
{
'one'
,
'two'
,
'three'
};
$pair
=
Pair
{
'one'
,
'two'
};
Collection literal syntax is allowed in any position where regular PHP array()
syntax is allowed, including in the initializer expressions of object and class properties. This is the reason why it exists: even though collection literal syntax entails object creation (which usually isn’t allowed in these positions), it is legal anywhere array()
is legal. For example:
class
Pluralizer
{
private
static
Map
<
string
,
string
>
$cache
=
Map
{};
}
Collection literals in this position are not allowed to contain any expression that is itself not allowed in this position. array()
syntax has the same restriction. For example, this is not valid syntax, because function calls are not valid class property initializers:
class
Pluralizer
{
// Syntax error
private
static
Map
<
string
,
string
>
$cache
=
Map
{
'child'
=>
fetch_plural_from_db
(
'child'
)};
}
Note that although the collection classes are generic, there are no type arguments in literal syntax:
$vec
=
Vector
<
int
>
{
1
,
2
,
3
};
// Syntax error
Instead, the typechecker will silently track the types of the collection’s contents, and only check for errors when you pass the collection through a type annotation (e.g., by assigning it to a property with a type annotation). See “Unresolved Types, Revisited” for full details.
The square-bracket syntax that you use with arrays is also what you use with Vector
, Map
, and Pair
:
$vector
=
Vector
{
'zero'
,
'one'
,
'two'
};
echo
$vector
[
1
];
// Prints 'one'
$map
=
Map
{};
$map
[
'zero'
]
=
0
;
$pair
=
Pair
{
'first'
,
'second'
};
echo
$pair
[
0
];
// Prints 'first'
If you try to read an element that doesn’t exist, or to set an element in a Vector
that is beyond the Vector
’s bounds, an OutOfBoundsException
will be thrown. Accessing elements by reference (as in $ref = &$array[0]
in regular PHP) is not allowed with collections; doing so results in a fatal error.
You can’t use this syntax to modify Set
s. You can use it to read from Set
s, but you shouldn’t. The most common operation on a Set
is to test whether a value is in it, and the square-bracket syntax is unsuitable for that: if the value is not in the Set
, it will throw an OutOfBoundsException
. For membership testing, use the contains()
method (see “Type Annotations for Collections”) instead:
if
(
$the_list
->
contains
(
$user_id
))
{
echo
"You're on the list"
;
}
Arrays have a quirky behavior wherein keys that are strings containing the representation of an integer2 are treated as the integer instead. For example:
$array
=
array
(
'3'
=>
'three'
);
echo
$array
[
3
];
// Prints 'three'
$array
=
array
(
3
=>
'three'
);
echo
$array
[
'3'
];
// Prints 'three'
Hack collections do not do this. Map
and Set
treat the string "3"
and the integer 3
as distinct keys, and if you use anything other than an integer to index into a Vector
or Pair
, an InvalidArgumentException
will be thrown.
To test whether a key exists in a Map
or an element exists in a Set
, you can use the containsKey()
and contains()
methods, respectively:
$map
=
Map
{
'one'
=>
'un'
,
'two'
=>
'deux'
};
if
(
$map
->
containsKey
(
'two'
))
{
echo
"We know how to say 'two' in French!"
;
}
$set
=
Set
{
'one'
,
'two'
};
if
(
$set
->
contains
(
'one'
))
{
echo
"'one' is in the set"
;
}
You can also use isset
and empty
to test if a key or element exists, but you should always use containsKey()
or contains()
if possible. isset
and empty
aren’t allowed in Hack strict mode—see “isset, empty, and unset” for the reasons why. The only reason you may want to use them on collections is so that you can write code that accepts both arrays and collections seamlessly.
Like empty arrays, empty collections of any type evaluate to false
when converted to bool
. In particular, they’re treated as false
in conditional statements like if
and while
, and in ternary expressions:
$vector
=
Vector
{};
if
(
$vector
)
{
// Code in here will not be executed
}
$description
=
(
$vector
?
(
string
)
$vector
:
'[none]'
);
You can iterate over collections with foreach
:
$vector
=
Vector
{
'zero'
,
'one'
};
foreach
(
$vector
as
$value
)
{
echo
$value
;
}
$map
=
Map
{
'one'
=>
'un'
,
'two'
=>
'deux'
};
foreach
(
$map
as
$eng
=>
$fr
)
{
echo
$eng
.
' in French is '
.
$fr
;
}
Adding or removing an item in a collection while iterating over it with foreach
is not allowed; doing that will result in an InvalidOperationException
being thrown.
foreach
by reference, as in foreach ($vector as &$value)
, is also not allowed; doing that will result in a fatal error. You can approximate this behavior by adding the key or index as an iteration variable, as in foreach ($vector as $index => $value)
, and modifying the value that way:
// Old code with array
$array
=
array
(
0
,
1
,
2
);
foreach
(
$array
as
&
$value
)
{
$value
*=
10
;
}
// Equivalent code with Vector
$vector
=
Vector
{
0
,
1
,
2
};
foreach
(
$vector
as
$index
=>
$value
)
{
$vector
[
$index
]
=
$value
*
10
;
}
You can append values to Vector
s, and add them to Set
s, with the normal empty-square-bracket syntax. In the case of Set
s, if the value already exists in the Set
, there’s no effect:
$vector
=
Vector
{
'zero'
};
$vector
[]
=
'one'
;
print_r
(
$vector
);
// Prints: "HHVector Object( [0] => zero, [1] => one )"
$set
=
Set
{
'eins'
};
$set
[]
=
'eins'
;
// Value is already in $set; nothing happens
print_r
(
$set
);
// Prints: "HHSet Object( eins )"
The same syntax works with Map
s, but because you have to specify both a key and a value, the righthand side of the expression must be a Pair
of key and value:
$map
=
Map
{};
$map
[]
=
Pair
{
'one'
,
'eins'
};
print_r
(
$map
)
// Prints: "HHMap Object( [one] => eins )"
You can also use the add()
method on Vector
s and Set
s, passing the value to be added as the only argument. Map
has the add()
method, too; pass it a Pair
of key and value.
To delete a value from a Vector
, use the removeKey()
method:
$vec
=
Vector
{
'first'
,
'second'
,
'third'
};
$vec
->
removeKey
(
1
);
print_r
(
$vec
);
// Prints: "HHVector Object( [0] => first, [1] => third )"
Note that the elements that are after the removed one are all shifted down by one index, so that the index 1 now holds the value 'third'
. This is in line with vector semantics, which state that all indices between 0 and n–1, inclusive, are valid (where n is the number of elements in the vector).
The method to remove a key from a Map
is also called removeKey()
. To remove a value from a Set
, use the method remove()
.
You can also delete items from Map
s and Set
s using the unset
statement:
$map
=
Map
{
'one'
=>
'un'
,
'two'
=>
'deux'
};
unset
(
$map
[
'one'
]);
print_r
(
$map
);
// Prints: "HHMap Object( [two] => deux )"
However, again, you should generally use the methods instead, as unset
isn’t allowed—for good reason—in strict mode. You can use unset
if you need to write code that accepts both arrays and collections seamlessly.
unset
does not work with Vector
s. This is because the semantics of removing elements from Vector
s don’t match the semantics of removing elements from arrays. Unsetting an element of an array (even one that’s being used like a vector) leaves a “hole,” where the array’s valid indices are not contiguous, thus breaking vector semantics:
$arr
=
array
(
'zero'
,
'one'
,
'two'
);
unset
(
$arr
[
1
]);
print_r
(
$arr
);
// Prints: "Array( [0] => zero, [2] => two )"
Collections can be compared for equality with the ==
operator. This is how it works:
If the two sides are not the same kind of collection (disregarding mutability), the result is false
. For example, a Vector
may compare equal to an ImmVector
, but it will never compare equal to a Map
.
If the two sides are Vector
s or ImmVector
s, the result is true
if and only if both sides contain the same number of values, and the values at each index compare equal using ==
. For example:
$vector
=
Vector
{
1
,
2
};
$immvector
=
ImmVector
{
1
,
2
};
$strings
=
Vector
{
'1'
,
'2'
};
$wrong_order
=
Vector
{
2
,
1
};
var_dump
(
$vector
==
$immvector
);
// true
var_dump
(
$vector
==
$strings
);
// true, because 1 == '1', 2 == '2'
var_dump
(
$vector
==
$wrong_order
);
// false
If the two sides are Pair
s, the result is true
if and only if the values at each index compare equal using ==
.
If the two sides are Set
s or ImmSet
s, the result is true
if and only if both sides contain the same number of values, and every element in one side exists in the other side. Unlike with Vector
s, these existence tests are done with ===
identity comparison. Order is irrelevant. For example:
$set
=
Set
{
1
,
2
};
$immset
=
ImmSet
{
1
,
2
};
$strings
=
Set
{
'1'
,
'2'
};
$wrong_order
=
Set
{
2
,
1
};
var_dump
(
$set
==
$immset
);
// true
var_dump
(
$set
==
$strings
);
// false
var_dump
(
$set
==
$wrong_order
);
// true
If the two sides are Map
s or ImmMap
s, the result is true
if and only if both sides contain the same number of keys, every key in one side exists in the other side (using identity comparison), and identical keys map to equal values (using ==
comparison). Order is irrelevant. For example:
$map
=
Map
{
10
=>
20
,
20
=>
40
};
$string_keys
=
Map
{
'10'
=>
20
,
'20'
=>
40
};
$string_values
=
Map
{
10
=>
'20'
,
20
=>
'40'
};
var_dump
(
$map
==
$string_keys
);
// false
var_dump
(
$map
==
$string_values
);
// true
Collections can be compared for identity with the ===
operator. This only evaluates to true
if both sides of the operator are the same object. If they are distinct objects, ===
comparison will evaluate to false
even if the two objects have the same contents:
$vector
=
Vector
{
1
,
2
};
$another_variable
=
$vector
;
var_dump
(
$vector
===
$another_variable
);
// true
$other
=
Vector
{
1
,
2
};
var_dump
(
$vector
===
$other
);
// false
List assignment with a collection on the righthand side works just as if the collection were an array. List assignment is shorthand for indexing into the array or collection on the righthand side with integer keys, so this is the behavior for Map
s and Set
s (the internal ordering of the Map
or Set
doesn’t matter):
$vector
=
Vector
{
'one'
,
'two'
};
list
(
$one
,
$two
)
=
$vector
;
$map
=
Map
{
1
=>
'one'
,
0
=>
'zero'
};
list
(
$zero
,
$one
)
=
$map
;
// $zero is 'zero' and $one is 'one'
Vector
, Map
, and Set
have immutable equivalents: ImmVector
, ImmMap
, and ImmSet
, respectively. (Pair
is immutable and has no mutable equivalent.) They don’t implement any methods that modify their contents, and they can’t be modified through square-bracket syntax or unset
; if you try to do so, an InvalidOperationException
will be thrown. The contents of immutable collections are fixed when they’re created. They can be created with literal syntax—just use ImmVector
, ImmMap
, or ImmSet
as the class name—or through their constructors or conversion from another collection (see “Concrete Collection Classes”).
You should generally use immutable collections whenever possible. If some data isn’t supposed to change, enforcing that contract closes off a possible source of bugs. It also encodes more information about the program’s behavior in the type system, which is always a good thing.
Most of the time, you shouldn’t use the collection class names themselves in type annotations. Hack provides a large set of interfaces that describe elements of a collection’s functionality, and you should generally use those in type annotations.
For example, if you’re writing a function that takes a set of values as an argument and doesn’t modify it, you should annotate the argument as ConstSet
, an interface, rather than Set
, the concrete class. This increases expressiveness, which helps the typechecker catch more mistakes: if you try to modify the set within the function, there will be a type error. It also makes the function’s contract clear to callers: it wants a set, and it won’t modify it.
In this section, we’ll see the interfaces that you’re most likely to use. This will double as a natural way to present the object-oriented interfaces to the collection classes. If you just want to see the collection class APIs all in one, skip to “Concrete Collection Classes”; that section doesn’t have explanations for the methods, but many of them are self-explanatory, especially with type annotations.
The core collection interfaces are:
Traversable<T>
Anything that can be iterated over using foreach
without a key is Traversable
. Within such a foreach
, the iteration variable will have type T
. This is the only thing Traversable
guarantees; it does not declare any methods.
The most important thing about Traversable
is that regular PHP arrays are Traversable
. This is unusual, because arrays are not objects and, in general, only objects can implement interfaces. Traversable
is special-cased in the runtime to have this behavior.
In addition to arrays and collections, Traversable
includes objects that implement Iterator
.
Traversable
can help bridge the gap between arrays and collections. If the only thing you do with a function argument is iterate over it using foreach
without a key, irrespective of whether it’s an array, a collection, or something else, you should annotate it as Traversable
.
Note that if you’re implementing your own class that you want to be usable with foreach
, you should not make it implement Traversable
. Use Iterable
(described shortly) instead.
KeyedTraversable<Tk, Tv> extends Traversable<Tv>
KeyedTraversable
is similar to Traversable
, but additionally indicates that it’s valid to include a key in the foreach
statement. Regular PHP arrays are KeyedTraversable
. The following example shows the difference between Traversable
and KeyedTraversable
:
function
notKeyed
(
Traversable
<
T
>
$traversable
)
:
void
{
// Not valid
foreach
(
$traversable
as
$key
=>
$value
)
{
// ...
}
}
function
keyed
(
KeyedTraversable
<
Tk
,
Tv
>
$traversable
)
:
void
{
// Valid
foreach
(
$traversable
as
$key
=>
$value
)
{
// $key is of type Tk
// $value is of type Tv
}
}
Container<T> extends Traversable<T>
Container
is exactly like Traversable
, except that it does not include objects that implement Iterator
. In other words, it includes only arrays and instances of collection classes. The only thing you can do with a Container
is to iterate over it with foreach
.
KeyedContainer<Tk, Tv> extends KeyedTraversable<Tk, Tv>
Similarly, KeyedContainer
is like KeyedTraversable
, except that it is restricted to arrays and collection classes other than Set
and ImmSet
.
Indexish<Tk, Tv> extends KeyedTraversable<Tk, Tv>
Indexish
signifies anything that can be indexed into using square-bracket syntax: $indexish[$key]
. It declares no methods. Like Traversable
and KeyedTraversable
, it is a special interface that is “implemented” by arrays as well as collections and other objects that support this syntax.
IteratorAggregate<T> extends Traversable<T>
This interface is for objects that can produce an Iterator
object to iterate over their contents. Unlike the previous three interfaces, it is not implemented by arrays. It’s very unlikely that you’ll ever use IteratorAggregate
in type annotations—either Iterable
or Traversable
is probably more appropriate. The interface declares a single method:
Iterable<T> extends IteratorAggregate<T>
This is where the real capabilities of collections begin to come in. The Iterable
interface declares several methods:
toArray(): array
converts the collection to an array. Note that the return value does not have a type argument: it’s simply array
instead of array<T>
.
toValuesArray(): array
converts the collection to an array but discards the keys, replacing them with the integers 0 to n–1, in order.
toVector(): Vector<T>
converts the collection to a Vector
. This is very similar to toValuesArray()
; if the collection has keys (i.e., is a Map
), the keys will be discarded.
toImmVector(): ImmVector<T>:
converts to an immutable Vector
.
toSet(): Set<T>
converts the collection to a Set
, discarding the keys, if any.
toImmSet(): ImmSet<T>
converts to an immutable Set
.
values(): Iterable<T>
returns an Iterable
object yielding the collection’s values (discarding keys).
map<Tm>(function(T): Tm $callback): Iterable<Tm>
returns an Iterable
object yielding the collection’s values after they have been passed through the given function. It is much like the standard PHP array_map()
function. Here’s an example that multiplies the elements of a Vector
by 10:
$nums
=
Vector
{
1
,
2
,
3
};
print_r
(
$nums
->
map
(
function
(
$x
)
{
return
$x
*
10
;
}));
HHVector Object ( [0] => 10 [1] => 20 [2] => 30 )
filter(function(T): bool $callback): Iterable<T>
returns an Iterable
object yielding the values from the collection that make the given function return true
. Here’s an example of picking out even numbers from a Vector
:
$nums
=
Vector
{
1
,
2
,
3
,
4
};
print_r
(
$nums
->
filter
(
function
(
$x
)
{
return
$x
%
2
===
0
;
}));
HHVector Object ( [0] => 2 [1] => 4 )
zip<Tz>(Traversable<Tz> $traversable): Iterable<Pair<T, Tz>>
returns an Iterable
object that pairs up the values from this collection and the values from the passed-in Traversable
. An example is the best way to explain it:
$english
=
Vector
{
'one'
,
'two'
,
'three'
};
$french
=
Vector
{
'un'
,
'deux'
,
'trois'
};
print_r
(
$english
->
zip
(
$french
));
This will output:
HHVector Object ( [0] => HHPair Object ( [0] => one [1] => un ) [1] => HHPair Object ( [0] => two [1] => deux ) [2] => HHPair Object ( [0] => three [1] => trois ) )
If the two collections have different counts, the resulting Iterable
will have the smaller count.
KeyedIterable<Tk, Tv> extends Iterable<Tv>
This is analogous to Iterable
, but with the key’s type included. It adds some new methods and overrides some from Iterable
with different return types. The new methods are listed first:
toKeysArray(): array
returns an array of the Iterable
’s keys.
toMap(): Map<Tk, Tv>
returns the Iterable
converted to a Map.
keys(): Iterable<Tk>
* returns an Iterable
over this Iterable
’s keys.
mapWithKey<Tm>(function(Tk, Tv): Tm $callback): KeyedIterable<Tk, Tm>
is like map()
but passes keys to the callback function as well as values.
filterWithKey(function(Tk, Tv): bool $callback): KeyedIterable<Tk, Tv>
is like filter()
but passes keys to the callback function as well as values.
getIterator(): KeyedIterator<Tk, Tv>
is an override with a more specific return type.
map<Tm>(function(T): Tm $callback): KeyedIterable<Tk, Tu>
is an override with a more specific return type.
filter(function(T): bool $callback): Iterable<T>
is an override with a more specific return type.
zip<Tz>(Traversable<Tz> $traversable): Iterable<Pair<T, Tz>>
is an override with a more specific return type.
There are three core interfaces that declare the most basic collection functionality. You’ll essentially never use these in type annotations, as they’re too nonspecific to be useful that way, but we’ll look at them here to learn these core functions:
ConstCollection<T>
A read-only collection of values of type T
. It says nothing about uniqueness of values, ordering, underlying implementation, or anything.
Every concrete collection class implements this interface (indirectly). It may seem unsuitable for Map
, because it only has one type parameter and Map
needs two (one for keys and one for values), but Map
s do implement ConstCollection
: a Map
with key type Tk
and value type Tv
implements ConstCollection<Pair<Tk, Tv>>
.
This interface declares three methods:
count(): int
returns the number of values in the collection.
isEmpty(): bool
returns whether the collection is empty.
items(): Iterable<T>
returns a value that can be iterated over using foreach
, and will yield every value in the collection.
OutputCollection<T>
This interface declares two methods that allow adding values to the collection (every mutable collection class implements this):
add(T $value): this
adds the given value to the collection and returns the collection itself.
addAll(?Traversable<T> $values): this
iterates over the given Traversable
and adds each resulting value to the collection. It returns the collection itself.
Collection<T> extends ConstCollection<T>, OutputCollection<T>
This interface declares no methods; it just serves to combine the read-only behavior of ConstCollection
and the write-only behavior of OutputCollection
.
Now, at last, we’ll get into specific collection functionality. We’ll look at six collection interfaces and the methods they declare.3 They’re meant to describe functionality independent of implementation. For now, there’s only one concrete implementation of each, but there may be others in the future—for example, one can imagine a linked list–based class that implements MutableVector
.
All of these interfaces either directly or indirectly extend KeyedIterable
, which declares several methods with KeyedIterable
as their return type, such as map()
and filter()
. All of these interfaces override such methods with specific return types—for example, ConstVector<T>
declares filter(function(T): bool $callback): ConstVector<T>
. These overridden methods are omitted in the following list:
ConstSet<T> extends ConstCollection<T>, KeyedIterable<mixed, T>
This represents a read-only set of values of type T
.4 It declares only one method directly:
contains(T $value): bool
returns whether the given value is in the set. The semantics are the same as ===
comparison: the result is true
if and only if there is a value `in the set that compares identical to $value
using ===
.
MutableSet<T> extends ConstSet<T>, Collection<T>
This represents a modifiable set of values of type T
. It extends ConstSet
and declares two methods directly:
clear(): this
removes all values from the set, and returns the set.
remove(T $value): this
removes the given value from the set (doing nothing if the value is not in the set), and returns the set. As with contains()
, the semantics are the same as ===
comparison.
ConstVector<T> extends ConstCollection<T>, KeyedIterable<int, T>
This represents a read-only sequence of values of type T
, indexed by integers. It declares three methods directly:
at(int $index): T
returns the value at the given index, or throws an exception if the index is out of bounds.
containsKey(int $index): bool
returns whether the given index is in bounds.
get(int $index): ?T
returns the value at the given index, or null
if the index is out of bounds.
MutableVector<T> extends ConstVector<T>, Collection<T>
This represents a modifiable sequence of values of type T
. It extends ConstVector
and adds these methods:
clear(): this
removes all values from the vector.
removeKey(int $index): this
removes the value at the given index. In line with vector semantics, the values at higher indices will all be shifted down by one, so that the indices remain contiguous.
set(int $index, T value): this
sets the given value at the given index, throwing an exception if the index is out of bounds. If you want to extend the vector, use add()
.
setAll(KeyedTraversable<int, T> $kt): this
iterates over the given KeyedTraversable
and calls set()
with each key/value pair in it.
ConstMap<Tk, Tv> extends ConstCollection<Pair<Tk, Tv>>, KeyedIterable<Tk, Tv>
This represents a read-only mapping of keys of type Tk
to values of type Tv
. It declares methods that resemble those of ConstSet
and ConstVector
:
at(Tk $key): Tv
returns the value for the given key, or throws an exception if the key isn’t in the map.
contains(Tk $key): bool
returns whether the given key exists in the map.
containsKey(Tk $key): bool
is the same as contains()
. The duplication of methods is just a quirk of the inheritance hierarchy of these interfaces.
get(Tk $key): ?Tv
returns the value for the given key, or null
if the key isn’t in the map.
MutableMap<Tk, Tv> extends ConstMap<Tk, Tv>
This represents a modifiable mapping of keys to values. Again, the methods that it declares are a combination of the methods from MutableVector
and MutableSet
:
clear(): this
removes all keys and values from the map.
remove(Tk $key): this
removes the value at the given key.
removeKey(Tk $key): this
is exactly the same as remove()
.
set(Tk $key, Tv $value): this
sets the given value at the given key.
setAll(KeyedTraversable<Tk, Tv> $kt): this
iterates over the given KeyedTraversable
and calls set()
with each key/value pair in it.
Finally, to bring all this together, we’ll look at the full type-annotated APIs to all the collection classes. Each one implements one of the six interfaces from the previous section, and adds a few more useful methods.
Only methods defined by the classes themselves, and not declared by any of the interfaces we just saw, are listed here:
ImmVector<T> implements ConstVector<T>
__construct(?Traversable<T> $values)
creates a new ImmVector
with the contents of the given Traversable
.
linearSearch(T $value): int
performs a linear search for the given value within the ImmVector
and returns the index at which the value was found, or -1
if it wasn’t found.
__toString(): string
just returns "ImmVector"
.
Vector<T> implements MutableVector<T>
__construct(?Traversable<T> $values)
creates a new Vector
with the contents of the given Traversable
.
linearSearch(T $value): int
performs a linear search for the given value within the Vector
and returns the index at which the value was found, or -1
if it wasn’t found.
pop(): T
removes the last value from the Vector
and returns it.
reserve(int $size): void
hints to the Vector
that it should reallocate memory to hold the given number of values. The Vector
may not do exactly that; this is just a hint.
resize(int $size, T $value): void
changes the size of the Vector
to the passed size. If the new size is smaller than the current size, values at the end of the Vector
are removed. If the new size is larger, the new values are set to $value
.
reverse(): void
reverses the Vector
in place.
shuffle(): void
randomly rearranges the values in the Vector
.
splice(int $offset, ?int $len = NULL): void
removes $len
values from the Vector
, starting at $offset
. If $len
is not passed, it removes every value from $offset
to the end of the Vector
. This is similar to the built-in function array_splice()
.
__toString(): string
just returns "Vector"
.
ImmSet<T> implements ConstSet<T>
__construct(?Traversable<T> $values)
creates a new ImmSet
with the contents of the given Traversable
.
fromArrays(...): ImmSet<T>
is a static
method that takes a variable number of arguments, which must all be arrays, and creates an ImmSet
from all their contents.
fromItems(?Traversable<T> $items): ImmSet<T>
is a static method that creates an ImmSet
from the given Traversable
.
__toString(): string
just returns "ImmSet"
.
Set<T> implements MutableSet<T>
__construct(?Traversable<T> $values)
creates a new ImmSet
with the contents of the given Traversable
.
fromArrays(...): Set<T>
is a static method that takes a variable number of arguments, which must all be arrays, and creates an ImmSet
from all their contents.
fromItems(?Traversable<T> $items): Set<T>
is a static method that creates an ImmSet
from the given Traversable
.
removeAll(?Traversable<T> $values): Set<T>
removes all the values in the given Traversable
from the set, and returns the set itself.
__toString(): string
just returns "Set"
.
ImmMap<Tk, Tv> implements ConstMap<Tk, Tv>
__construct(?KeyedTraversable<Tk, Tv> $values)
creates a new ImmMap
with the contents of the given Traversable
.
fromItems(?Traversable<Pair<Tk, Tv>> $items): ImmMap<T>
is a static method that creates an ImmMap
from the given Traversable
.
__toString(): string
just returns "ImmMap"
.
Map<Tk, Tv> implements MutableMap<Tk, Tv>
Like other Hack features, collections were designed with interoperability in mind. A codebase can be gradually converted from using arrays to using collections.
All Hack collections can be converted to arrays with a cast expression, or with the toArray()
method:
$vector
=
Vector
{
'first'
,
'second'
};
print_r
((
array
)
$vector
);
// Prints: Array( [0] => first, [1] => second )
print_r
(
$vector
->
toArray
());
// Same
The conversions are straightforward:
Vector
s and ImmVector
s convert to arrays where the keys are the integer indices of the values, in the same order.
Map
s and ImmMap
s convert to arrays with the same key/value pairs, in the same order.
Set
s and ImmSet
s convert to arrays with each key mapping to itself, in the same order.
Pair
s convert to arrays with the keys 0 and 1 (integers) in that order, mapping to the corresponding values.
There is a small wrinkle in the case of integer-like string keys (see “Reading and Writing”) in Map
s and Set
s. If the Map
or Set
contains keys that conflict with each other in this way, an E_WARNING
-level error will be raised. The conflicting keys will reduce to one integer key in the resulting array, and it will map to the last value under the conflicting keys:
<?
php
$map
=
Map
{
10
=>
'int'
,
'10'
=>
'string'
};
$array
=
(
array
)
$map
;
// Warning: Map::toArray() for a map containing both int(10) and string('10')
var_dump
(
$array
);
// Prints: array(1) { [10]=> string(6) "string" }
$set
=
Set
{
10
,
"10"
}
$array
=
(
array
)
$set
;
// Warning: Set::toArray() for a map containing both int(10) and string('10')
var_dump
(
$array
);
// Prints: array(1) { [10]=> string(2) "10" }
Hack has a lot of built-in functions that can take arrays as arguments. There are several different ways in which these have been adapted to work with collections.
Hack has a wide variety of functions that are used to sort arrays. All of these have been adapted to work with collections as well, but each one only works with certain types of collections.
Vector
s only work with sort()
, rsort()
, and usort()
. All the other sorting functions are concerned with keys, which doesn’t make sense for a Vector
.
Map
s and Set
s only work with asort()
, arsort()
, ksort()
, krsort()
, usort()
, uasort()
, uksort()
, natsort()
, and natcasesort()
. Note that for Set
s, sorting by key is the same as sorting by value.
Immutable collections and Pair
s aren’t supported because they’re immutable, and these functions sort in place. Make a mutable copy of the collection and sort that instead.
The remaining built-ins that deal with arrays take a variety of approaches. There are a few specific kinds to look at first:
Four built-ins that modify arrays have been adapted to work with collections:
array_pop()
array_push()
array_shift()
array_unshift()
The rest have not. Note that array_push()
and array_unshift()
support only Vector
and Set
.
Built-ins that read or modify arrays’ internal pointers, such as current()
and reset()
, don’t work with collections at all, because collections don’t have an equivalent of Hack arrays’ internal pointers.
Debugging and introspection functions produce output for collections similar to what they produce for arrays. For example, this:
var_dump
(
array
(
10
,
20
));
var_dump
(
Vector
{
10
,
20
});
produces:
array(2) { [0]=> int(10) [1]=> int(20) } object(HHVector)#1 (2) { [0]=> int(10) [1]=> int(20) }
debug_zval_dump()
print_r()
var_dump()
var_export()
serialize()
can serialize collections, but the resulting serialized string can only be unserialized by HHVM. (Collections aren’t serialized the same way as other objects.)
The most common case among the remaining built-ins is that they have a parameter that must be an array and is not by-reference. Examples of this include count()
and array_diff()
. In cases like this, if you pass a collection as that parameter, it will be automatically converted to an array,5 with no warning or error.
The last, and trickiest, category of built-ins consists of the ones that adapt their behavior based on the types of the arguments they’re passed. apc_store()
is an example: if the first argument is a string, a single value is stored in the Alternative PHP Cache (APC); but if it’s an array, all the key/value mappings in the array are stored in APC. In general, built-ins like these do not support collections. The lone exception in HHVM 3.6 is implode()
.
Non-built-in functions with an array
typehint will implicitly convert passed-in collections to arrays, but there will be an E_NOTICE
-level error when doing so. The rationale for this behavior is that this code is likely under your control, so you can modify it to have a collection typehint, or Indexish
, or Traversable
, or whatever is appropriate. However, it may not be under your control (e.g., it could be in a third-party library), so making this a hard error like a fatal or an exception is too strict. For example, this code:
function
examine
(
array
$items
)
{
if
(
is_array
(
$items
))
{
echo
"It's an array!"
;
}
}
examine
(
Vector
{
1
,
2
,
3
});
produces the following output:
Notice: Argument 1 to examine() must be of type array, HHVector given; argument 1 was implicitly cast to array It's an array!
By contrast, if you pass an array to a user function that expects a collection, no implicit conversion will happen, and the typehint will fail.
1 It is not actually copied in memory at that point, either in standard PHP or in HHVM; instead, it is only copied when it is modified. This is called copy-on-write. You may have heard statements like “PHP arrays are copy-on-write,” which is true but describes implementation rather than semantics. Well, sort of. Copy-on-write should be an implementation detail—it behaves as if the array were copied at the time of the assignment—but it’s not quite. There are some obscure corner cases where the copy-on-write is detectable, although those cases are arguably bugs in the language.
2 This is actually not the same logic as is used when converting strings to integers. The string must be the decimal representation of an integer between –263 and 263 – 1 inclusive, with no leading or trailing whitespace or leading zeros. This “feature” is very bad for performance: on every array lookup, which is one of the most common operations in any PHP or Hack program, the key has to be checked for these conditions. There are some possible micro-optimizations, but it still incurs a noticeable performance cost.
3 This section is not telling the whole story. There are actually six other interfaces in the picture, called SetAccess
, ConstSetAccess
, and similar. I’m not going into all the details of those because they’re not used in type annotations and aren’t essential to using collections.
4 You may wonder why this interface extends KeyedIterable<mixed, T>
instead of KeyedIterable<T, T>
. The reason is a subtle problem with the type of map()
. KeyedIterable<T, T>
would declare a map<Tm>()
function that returned KeyedIterable<T, Tm>
. Then, ConstSet<T>
would override it with a version that returned ConstSet<Tm>
. The problem is that these are not compatible: in KeyedIterable<T, Tm>
, the key and value types may be different, but in ConstSet<Tm>
, they cannot be different. Making the key type mixed
is slightly inelegant, and this may change in the future with additional typechecker functionality.
5 For efficiency, some of these built-ins have been adapted to use the collection directly, without converting it to an array, but the effect is exactly the same.
13.59.18.83