Metrics

Now, let's play with the metrics functionality, as follows:

  1. First, we'll tag our current version to name it, so we can navigate and understand what is inside each commit (metrics are always tracked, but the command-line interface will show them for either tagged commits or branches):
git tag -m leaders -a "basic-features-and-leaders"
  1. Now, for the sake of testing, let's test our model without the leaders feature; just temporarily remove the corresponding feature from the list of features to use, which we defined in the code. Now, let's reproduce the model:
dvc repro
  1. Once the new model is done, we commit changes and tag a new commit:
git commit -m "same model with no leader features";
git tag -m no-leaders -a "basic-features"
  1. Feel free to push all changes to the remote to pass tags; we need to push with the --tags flag, but none of that is required for DVC.
  2. Finally, let's check in the code with the random forest model. Add the random forest model to the script and run DVC again, as follows:
dvc repro;
git commit -am "random forest";
git tag -m rf -a "random-forest"

Now that metrics for the two models are cached, we can use DVC to show the changes (here, a draft is our current branch and last commit).

  1. The following code will ask DVC to show metrics (for example, the file we specified as metric) across all tagged commits:
dvc metrics show -T -x accuracy
>>>
working tree:
data/metrics.json: [0.5965367965367965]
basic-features:
data/metrics.json: [0.5488095238095239]
basic-features-and-leaders:
data/metrics.json: [0.5959415584415585]
random-forest:
data/metrics.json: [0.5965367965367965]

As you can see, this command allows us to check changes in accuracy across all of the tagged commits. Using a combination of git and DVC, we can always switch to any of those commits and have a correct version of both the code and data pulled.

According to this list, the leaders feature added substantial performance gain to the model. Switching to the random forest model adds a little more gain too. The best part is that we can continue working on our model, keeping track of metrics for the next iterations as well. All of the data, code, and metrics are properly stored and easy to get back to.

It is hard to overestimate the importance of proper tracking and version control for experimentation and reproducibility—both in academic environments and in the industry. This level of transparency allows you to showcase your improvements and communicate and collaborate in a breeze. Now, let's review what we learned in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.234.70