Chapter 10. Hack Tools

A programming language’s features are only part of what makes it good. To be useful, a language needs to have a good tooling ecosystem around it: editor and IDE support, debuggers, analysis and linting tools, etc. The Hack typechecker is built on a powerful static analysis platform that can support many of these uses.

The standard HHVM/Hack installation ships with several tools for inspecting code, as well as for migrating code from PHP to Hack and transpiling Hack code to PHP. This chapter is about those tools.

Inspecting the Codebase

The core of the Hack typechecker’s infrastructure is a server that remembers a set of facts about the codebase. Checking for type errors with hh_client is but one way of querying this set of facts. This section describes other options available to hh_client to query data:

--search

Use this flag to perform a fuzzy search for a given symbol name. Pass a single argument after the flag as the string to search for. Note that this will search built-in symbols as well:

$ hh_client --search wrap
File "/home/oyamauchi/hack/test.php", line 58, characters 7-13: Wrapper,
class

The search is very responsive: the typechecker server indexes the codebase and doesn’t need to read any source files to do the search.

There are several related flags that can be used to restrict the kinds of symbols that will be returned: --search-class, --search-function, --search-constant, and --search-typedef (which searches type aliases). Each of these is used the same way as plain --search and returns output in the same format.

--type-at-pos

Use this flag to ask the typechecker what it thinks the type of an expression is. Pass a filename, line number, and column number on the command line, separated by colons, to inspect the expression at that position:

$ cat test.php
<?hh // strict

function reversed_digits(int $x): string {
  return strrev((string)$x);
}

function main(): void {
  $f = fun('reversed_digits'),
  echo $f(123);
}
# Get type of $x within reversed_digits
$ hh_client --type-at-pos test.php:4:25
int

# Get type of result of string cast
$ hh_client --type-at-pos test.php:4:17
string

# Get type of $f in main()
$ hh_client --type-at-pos test.php:8:3
(function(int $x): string)

The type given is for the innermost expression at the given position. For example, if you query at the character a in the expression $a + $b, the result will be the type of $a, not of $a + $b. In this case, if you want the type of the whole expression, you have to query at the character +.

Note that the output of --type-at-pos may not be a valid type annotation; it’s purely for informational purposes. Most notably, for values of the special “unannotated” type (see “Code Without Annotations”), --type-at-pos outputs _ (a single underscore).

--find-refs and --find-class-refs

Use --find-refs to search for references to a given function or method, and --find-class-refs to search for references to a given class. Pass the name of the class, function, or method to search for as the single argument after the flag:

$ cat test.php
<?hh // strict

class C {}

class D extends C {}

function main(): void {
  $c = new C();
}

$ hh_client --find-class-refs C
File "/home/oyamauchi/hack/test.php", line 8, characters 12-12:
    C::__construct
File "/home/oyamauchi/hack/test.php", line 5, characters 17-17: C
2 total results
--inheritance-ancestors and --inheritance-children

Use these flags to print all the ancestors or descendants of a given class, respectively. Despite the name, --inheritance-children really does print all descendants, not just direct children:

$ cat test.php
&#x3c;?hh // strict

class GrandparentClass {}

class ParentClass extends GrandparentClass {}

class ChildOne extends ParentClass {}

class ChildTwo extends ParentClass {}

$ hh_client --inheritance-ancestors ChildOne
File "/home/oyamauchi/hack/test.php", line 7, characters 7-14: ChildOne
    inherited from File "/home/oyamauchi/hack/test.php", line 5,
    characters
    7-17: ParentClass
File "/home/oyamauchi/hack/test.php", line 7, characters 7-14: ChildOne
    inherited from File "/home/oyamauchi/hack/test.php", line 3,
    characters
    7-22: GrandparentClass

$ hh_client --inheritance-children
GrandparentClass
File "/home/oyamauchi/hack/test.php", line 3, characters 7-22:
GrandparentClass
    inherited by File "/home/oyamauchi/hack/test.php", line 9,
    characters 7-14:
    ChildTwo
File "/home/oyamauchi/hack/test.php", line 3, characters 7-22:
GrandparentClass
    inherited by File "/home/oyamauchi/hack/test.php", line 7,
    characters 7-14:
    ChildOne
File "/home/oyamauchi/hack/test.php", line 3, characters 7-22:
GrandparentClass
    inherited by File "/home/oyamauchi/hack/test.php", line 5,
    characters 7-17:
    ParentClass

Scripting Support

The typechecker client can produce the output for any of its commands in JSON, which lets you easily integrate it with other tools: editors, IDEs, code linters, refactoring tools, etc. Just add the flag --json to any hh_client command line, before all other arguments:

$ cat test.php
<?hh // strict

function main(): void {
  $var = 1 + "3";
}

$ hh_client --json
{
  "passed": false,
  "errors": [
    {
      "message": [
        {
          "descr": "Typing error",
          "path": "/home/oyamauchi/hack/test.php",
          "line": 4,
          "start": 14,
          "end": 16,
          "code": 4110
        },
        {
          "descr": "This is a num (int/float) because this is used in an
          arithmetic operation",
          "path": "/home/oyamauchi/hack/test.php",
          "line": 4,
          "start": 14,
          "end": 16,
          "code": 4110
        },
        {
          "descr": "It is incompatible with a string",
          "path": "/home/oyamauchi/hack/test.php",
          "line": 4,
          "start": 14,
          "end": 16,
          "code": 4110
        }
      ]
    }
  ],
  "version": "0939324e1252832cf6f65c51ff2cb811dad307ba Mar  8 2015 23:44:12"
}

The output shown here has been formatted for legibility; hh_client’s JSON output has no extraneous whitespace.

Migrating PHP to Hack

Hack’s creators know better than most how difficult it is to do an en-masse conversion of a large codebase. When Hack was first conceived, Facebook had a PHP codebase of tens of millions of lines, being worked on simultaneously by hundreds of engineers.

The benefits of Hack are compounded when most of a codebase is in Hack. For Facebook, this meant that some way to automatically migrate large swaths of code was essentially a hard requirement for Hack to gain any traction. The codebase was too large, and changed too quickly, for a manual approach to be workable.

As a result, the standard HHVM/Hack installation includes several tools for automated migration of PHP code to Hack.

The Hackificator

The first measure to take in converting a PHP codebase to Hack is to use the Hackificator, which performs an initial broad-strokes conversion. It scans a directory for PHP files, and performs two steps in those files:

  1. It makes some simple, mechanical changes to preempt Hack errors. For example, typehinted parameters with null default values are changed to make the typehints nullable. That is, function f(int $x = null)—valid in PHP, a type error in Hack—would be changed to function f(?int $x = null).

  2. It changes the opening <?php tag to <?hh, with the strictest mode that doesn’t introduce any typechecker errors. This will usually be partial or decl mode.

The Hackificator doesn’t touch anything else. Its purpose is to do the minimum possible to make code visible to the Hack typechecker.

Before running the Hackificator, there must be no typechecker errors in any files that are already Hack. That is, running hh_client must output No errors!. The Hackificator will refuse to run if there are errors.

Top-down or bottom-up migration

An important point to note is that the Hackificator processes files one at a time, in undefined order. The result of the run can therefore be different depending on the order in which it ends up processing files.

To illustrate this, let’s take a reduced version of a fairly common situation. In one PHP file, we have an abstract superclass. Scattered across many other PHP files are concrete subclasses—tens or even hundreds of them. In this example, we’ll just look at one.

Suppose we have files WorkItem.php:

<?php

abstract class WorkItem {
  abstract public function doWork();
}

and AckermannWorkItem.php:

<?php

class AckermannWorkItem extends WorkItem {
  public function doWork() {
    $this->running = true;
    // ...
  }
}

The first thing to note is that if we turn both files into partial-mode Hack files, there will be errors: the concrete subclass is using a property that isn’t declared. Therefore, the best we can do is to have one file in partial mode, with the other either in decl mode or in PHP.

If hackificator processes WorkItem.php first, it will put that file in partial mode. Because the subclass is still in PHP, it’s invisible to the typechecker, and WorkItem.php by itself has no errors in partial mode.1 Then, when it processes AckermannWorkItem.php, it can only put the file in decl mode: because the superclass is in Hack, it can analyze the whole hierarchy and determine that the property running isn’t declared, which is an error in anything other than decl mode.

If hackificator processes AckermannWorkItem.php first, it will put that file in partial mode. Its superclass is still in PHP so it’s invisible to the typechecker. The typechecker assumes that the property running is declared in the superclass, and doesn’t report an error. Then, when it tries converting WorkItem.php to Hack, undeclared property errors pop up in AckermannWorkItem.php, because its superclass is now visible to the typechecker. Then hackificator has to revert WorkItem.php back to PHP; it can’t go back to AckermannWorkItem.php to back off to decl mode (which would silence the error) after processing it.

The first pattern, migrating the superclass first, is a top-down migration to Hack. The advantage of this is that any new subclasses can start off in Hack and get the benefit of thorough typechecking with knowledge of their superclass, even while other subclasses have yet to be migrated. The fully typechecked portion of the hierarchy steadily, linearly increases from 0% to 100% as the migration proceeds.

The second pattern, migrating the superclass after all of its subclasses are in Hack, is a bottom-up migration. The advantage of this is that it gets more code into Hack sooner. However, the typechecker is handicapped in the subclasses, because it has no knowledge of their superclass. Much of the hierarchy is checked with this handicap from the beginning of the migration, with almost none of it checked without handicap until the very end.

Because of the way the Hackificator works, it’s far more likely to produce bottom-up conversions, simply because there are many subclasses and one superclass, so it’s more likely to encounter a subclass first. If you want to ensure a top-down conversion, convert the superclass manually before running the Hackificator.

Neither pattern is strictly better than the other, and you can use both within the same codebase, on different class hierarchies. We’re discussing them here mostly so that you know what to expect when using the Hackificator, and to help you make a considered choice.

Upgrading typechecker modes

There’s another conversion the Hackificator can do, which is to inspect Hack files (but not PHP files) and upgrade them to the strictest mode that doesn’t cause typechecker errors. Activate this with the command-line flag -upgrade (single hyphen).

This will often come in useful because the Hackificator’s default behavior will almost never produce a strict-mode file. This is because strict mode requires all return types to be annotated, but Hack’s return type annotation syntax is illegal in PHP (in all 5.x versions and earlier).

It can be useful to combine hackificator -upgrade with hh_server --convert, described in the next section. That tool adds annotations, which may get a partial-mode file into a state where it can be upgraded to strict mode cleanly.

Inferring and Adding Type Annotations

Adding type annotations is trickier, and requires a fair bit more manual work. The typechecker includes a mode in which it tries to infer the types of unannotated values, by working backward from annotated and known types, and annotates the inferred types in the code.

It’s important to note that this process isn’t perfect. The inferred type annotations are guaranteed not to cause typechecker errors, but they may turn out to be wrong at runtime. Because of that, all of the added type annotations are soft, so that they’ll cause warnings instead of fatal errors at runtime.

To deal with the resulting proliferation of soft typehints, there are two other tools that complement this one: one that reads a logfile and removes soft typehints that have produced warnings in the log, and another which that all soft typehints in a file.

Adding annotations

The tool to add annotations only works on Hack files (any mode). It’s part of the typechecker server, and you invoke it as follows:

$ hh_server --convert my_project my_project

After the --convert flag, there are two arguments: first, the directory in which to actually make modifications; and second, the top-level directory of the project. The separation of the two allows you to restrict the modifications to a subset of the project, which helps keep the work in manageably sized chunks when dealing with a large codebase. The two arguments are allowed to be the same, and it’s best if they are: the more code the tool can work with at once, the more effective it can be.

This inference process is considerably slower than the one the typechecker uses for Hack files, because it’s not function-local. For example, when processing a function with unannotated parameters, it will find that function’s callsites to see what arguments are passed. If it finds consistent argument types, it will add the appropriate annotations.

Removing incorrect annotations

Once these annotations are added, try them out. Running tests is the best starting point. The added annotations don’t change any behavior except for warnings, so they shouldn’t cause tests to fail, but running tests is a convenient way to run the code. In addition, run any command-line scripts you can; if your project is a web app, start up a web server and visit some pages. The aim here is to exercise as much of your code as possible.

While doing this, you have to capture error messages. If you’re running scripts or tests from the command line, the error messages go to standard out. You can just redirect standard out to a file:

$ hhvm testfile.php > errors.log

This will capture everything from standard out, including output from the script, but that’s not a problem. The annotation-removal tool uses regular expressions to search for very specific error messages, so the script’s output shouldn’t interfere.

If you’re running HHVM as a server, error messages again go to standard out by default. You can use a configuration option to have error messages written to a file instead:

$ hhvm -m server -d hhvm.log.file=errors.log

After running your code, if any soft type annotations failed, you’ll see error messages in the log that look like the following example—these are what the annotation-removal tool looks for:

Warning: Argument 1 to f() must be of type @int, string given in
/home/oyamauchi/hack/testfile.php on line 5
Warning: Value returned from function f() must be of type @int, string given in
/home/oyamauchi/hack/testfile.php on line 6

It’s important to note that the annotation-removal tool extracts the file path from the error message and looks for the file at exactly that path. If the file path in the logs is relative, the tool will resolve it relative to its current working directory.

HHVM outputs absolute file paths in error logs by default. This can be a problem if, for example, you gather logs from one machine and do the annotation removal on another machine with your project’s source at a different path. To deal with this, you can strip the path to the project root from the log messages using a tool like sed (the full usage of which is beyond the scope of this book):

$ sed -e 's!/home/oyamauchi/hack/!!g' < errors.log > errors-relative-paths.log

Finally, with a suitable log file, removing the incorrect annotations is very simple. If the error log has relative paths, make sure you’re in the right working directory. Then, use the command hack_remove_soft_types:

$ hack_remove_soft_types --delete-from-log errors.log

Hardening annotations

When you’re confident that the remaining annotations are correct, you can make all the remaining annotations hard. This is also done with hack_remove_soft_types:

$ hack_remove_soft_types --harden lib/core.hh

The tool only accepts a single file as an argument for now. If you want to apply the operation to all the files in a directory, you can use the find utility. This example applies it to every file whose name ends with .hh in the directory lib and all of its subdirectories, recursively:

$ find lib -type f -name '*.hh' -exec hack_remove_soft_types --harden '{}' ';'

Transpiling Hack to PHP

HHVM is currently the only execution engine that supports Hack. This means that anyone who can’t make the switch to HHVM can’t run Hack code. If you’re the author of a PHP library, this probably seems like a good reason not to migrate your code to Hack—there would be no sense in migrating when doing so would shut out many of your potential users.

The Hack transpiler was developed by the Hack team to assuage these concerns. The transpiler is a tool that automatically converts the codebase into PHP. The purpose isn’t to convert a Hack codebase to PHP so that you can develop it in PHP. Rather, the transpiler is meant to be used as a build step: you develop in Hack, and transpile to PHP as the final step before packaging. You ship two versions of your code: the original Hack version, for people who use HHVM and Hack; and the transpiled PHP version, for people who don’t.

The transpiler ships with HHVM, and you run it with the command h2tp. Give it the path to your Hack codebase, and a path where it can put the resulting PHP code. It will inspect any file with the extension .php or .hh. Any other files will be copied to the destination directory unmodified:

$ ls -a my_project
.  ..  .hhconfig  main.hh

$ h2tp my_project my_project_transpiled
The Conversion was successful

$ ls -a my_project_transpiled
.  ..  .hhconfig  main.php

The output PHP code is not meant to be edited. All comments are stripped, and formatting isn’t guaranteed to be preserved. The code isn’t needlessly obfuscated, though, so it shouldn’t be hard to understand a stack trace from the PHP code.

Once the PHP code has been generated, there is one more setup step. The generated code will make use of Hacklib, a collection of support functions and classes that are used by the transpiled code. Hacklib comes as part of the Hack/HHVM installation and is installed, by default, at path /usr/share/hhvm/hack/hacklib.

First, copy Hacklib into the directory containing your project’s transpiled PHP code:

$ cd my_project_transpiled

$ cp -r /usr/share/hhvm/hack/hacklib .

Second, add a line of code that will be executed before any of the generated files are loaded (via include, require, etc.). Put the path to Hacklib’s main file in the global variable HACKLIB_ROOT. For example, if the Hacklib code was copied to the top-level directory of the project:

$GLOBALS['HACKLIB_ROOT'] = __DIR__ . '/hacklib/hacklib.php';

Conversions

This section won’t go into full detail about all of the conversions that the transpiler does, but will explain enough to give you an idea of what to expect in the generated PHP code.

It’s important to note that the transpiled PHP code will run less efficiently than the original Hack code, even on the same execution engine. As we’ll see in this section, some common Hack constructs have to be replaced with less efficient PHP constructs—for example, some equality comparisons have to be replaced with function calls.

The transpiler will try to convert all Hack files, and won’t touch PHP files. It determines what language a file is in by its opening tag—<?hh or <?php—not by its file extension.

Here are the most important things the transpiler does:

  • All type annotations are removed. This also means that type aliases can simply be deleted, as type annotations are the only place where they’re used.

  • Collection literals (see “Literal Syntax”) are replaced with new expressions, where supported. The collection classes can still be used in PHP.

  • Lambda syntax (see “Lambda Expressions”) is replaced with regular closure syntax. The typechecker finds which variables need to be captured from the enclosing scope and generates the appropriate use list.

  • Enums (see “Enums”) are converted into classes, with the enum members as class constants. The special enum functions are provided by a trait from Hacklib.

  • Shapes (see “Array Shapes”) and tuples are replaced with arrays.

  • Attributes (see “Attributes”) are removed, except __Memoize, which is not supported; see the next section.

  • Trait requirements (see “Trait and Interface Requirements”) are removed.

  • Constructors with promoted arguments (see “Constructor Parameter Promotion”) are unfolded to declare the necessary properties and assign to them in the constructor’s body.

  • The nullsafe method call operator (see “Nullsafe Method Call Operator”) is simulated using a Hacklib class with the magic method __call().

  • Because the collection classes’ behavior in casting and equality comparisons isn’t special-cased in PHP like it is in Hack, some instances of those constructs have to be modified. For example, here is Hack code that relies on empty collections evaluating to false when cast to booleans:

    function average(Vector<num> $nums): num {
      if (!$nums) {
        throw new InvalidArgumentException(
          "Can't average an empty vector"
        );
      }
    
      // ...
    }

    To get equivalent behavior in PHP, the transpiler will use a helper function from Hacklib:

    function average($nums) {
      if (!hacklib_cast_as_boolean($nums)) {
        throw new InvalidArgumentException(
          "Can't average an empty vector"
        );
      }
    
      // ...
    }

Unsupported Features

There are several Hack features that the transpiler can’t convert to PHP. If it encounters any of these, the transpiler will give up on the entire file. It will never partially convert a file, or produce PHP code that doesn’t behave the same as the original Hack code.

The PHP code that the transpiler generates to simulate Hack features is compatible with PHP versions 5.4 and later. However, if you use features from a later version of PHP, such as generators (introduced in PHP 5.5), the transpiler will not touch those, and the output will only run on PHP 5.5 and later.

Here are the features that the transpiler doesn’t support:

  • Async functions (see Chapter 6). Running async functions requires extensive support from the runtime, and it’s not possible to simulate this in pure PHP in a reasonable way. It’s possible to convert async functions by simply removing the async and await keywords; this would produce correct results, but with no parallelism. The transpiler may start doing that in the future.

  • The __Memoize special attribute (see “Special Attributes”). Unlike other attributes, which are simply removed, __Memoize will cause a conversion failure. This attribute requires runtime support, and is tricky to simulate in pure PHP. The memoization pattern is easy to implement manually, though, as a workaround.

  • Traits that implement interfaces (see “Trait and Interface Requirements”).

  • Collection literals as initial values for non-static properties (see “Literal Syntax”). This is because a collection literal has to be converted to a new expression in PHP, and those aren’t allowed as property initializers. The restriction only applies to non-static properties because the initializers for static properties can simply be moved outside the class.

1 In strict mode, of course, there is an error: doWork() has no return type annotation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.68.81