Discovering insights from Formula 1 race results

In this section, we will be making use of historic race results from Formula 1. You can download this data through a publicly available ZIP file containing multiple CSV files representing drivers, races, and the results of each race since 1950. We will be loading CSV files into Einstein Analytics and using Einstein Discovery to explore insights and predictions in relation to the data.

The steps in this section do not provide click-by-click steps for using the Einstein Analytics and Discovery user interfaces and tools. If you plan to try out the Formula 1 motor racing scenario described in this part of the chapter, then make sure you have completed the Einstein Discovery Trail: https://trailhead.salesforce.com/content/learn/modules/wave_exploration_smart_data_discovery_basics.

In the following steps, we are going to upload several CSV files and then create a combined dataset from which we can then create predictions and insights:

Obtain a Salesforce Analytics trial org through this website: https://developer.salesforce.com/promotions/orgs/analytics-de. We will not be using Salesforce DX for this chapter.
Navigate to https://ergast.com/mrd/db/, download the f1db_csv.zip file, and unzip it into a folder of your choice.
We will be using the following CSV files: results.csv, status.csv, races.csv, driver.csv, constructors.csv, and circuits.csv.
Open each of these files in your favorite text editor and perform the following steps:
1. Observe the column names described on http://ergast.com/schemas/f1db_schema.txt and then add them as a comma-delimited list at the top of each file. For example, for the results.csv file, add the following:
  
  resultId,raceId,driverId,constructorId,number,grid,position,positionText,positionOrder,poi.
2. Einstein Discovery does not support the CSV N null escape character. Find and replace all references to N with an empty string.
3. Save the preceding changes to the file.
Launch Analytics Studio from the App Launcher in your Salesforce Analytics trial org created earlier. Click Create and select Dataset, followed by CSV as the data source:
1. Repeat the preceding process to upload each of the 6 CSV files listed in step 3. The order is not important.
2. When prompted for the Data Schema File, select the applicable .json file from the /analytics folder in the sample code for this chapter.
3. For example, when uploading results.csv, ensure that you also select results.json for the Data Schema File. This JSON file ensures that columns in the CSV file are correctly identified by Einstein Discovery as being influencer fields or outcome fields.
4. The following shows the datasets representing the CSV files uploaded:

Each of the six datasets that have been created are all related through IDs and must be joined into a brand new dataset that will be used to build insights and predictions. Once again, launch Analytics Studio from the App Launcher in your Salesforce Analytics trial org, and then click Create and select Data Set, followed by Your Dataset:
1. You create recipes to join datasets together. When prompted for Select the base data for your recipe, click on Results.
2. When prompted, enter resultsfull for the Recipe Name and click Next.
3. Click the Add Data button to add the status, races, driver, constructors, and circuits datasets you created in step 5:
  1. When adding the status dataset, enter the Lookup Keys as results.status and status.statusid. Select the status column.
  2. When adding the races dataset, enter the Lookup Keys as results.raceId and races.raceId. Select the columns year, name, circuitId, and round.
  3. When adding the driver dataset, enter the Lookup Keys as results.driverid and driver.driverId. Select the forename, surname, and nationality columns.
  4. When adding the constructors dataset, enter the Lookup Keys as results.constructorId and constructors.constructorId. Select the nationality and name columns.
  5. When adding the circuits dataset, enter the Lookup Keys as results.circuitId and circuits.circuitId. Select the country, name, and location columns.
4. Click Create Dataset to create the new dataset. This may take a few moments to process and you will be notified on screen when it is complete.

If you want to know more about adding data to recipes, you can review the Add More Data in a Recipe Salesforce help topic here: https://help.salesforce.com/articleView?id=bi_integrate_data_prep_recipe_add_data.htm&type=5.

The following screenshot shows what the resultsfull dataset should look like:

You will now see the resultsfull dataset in the list of datasets, along with the six you uploaded from the CSV files in the earlier steps. In the following steps, we will create a Story, which is the term used to define a collection of observations and predictions in relation to the data:

Locate the resultsfull dataset and select Create Story from its action drop-down menu. The following screenshot shows the default assumption that you want to ask Einstein Discovery how best to maximize the points your drivers earn in races:

Click Data Options and select surname, constru.name, circuit.name, status, and races.name as the fields to drive your story. Einstein will use these fields to determine reasons why points vary; for example, how combinations of the driver, circuit, or team make a difference.
Click Create Story to begin processing the data.

Once the story completes, you will see that Einstein has some recommendations on how to improve the accuracy of the story. Click the Recommended Updates button. You will see that Einstein is saying that position explains 40.4% of the variation in point, select Ignore position (Recommended). It is recommended that this is ignored because the final position a driver ends up completing the race in is a well-known influencer on total points they obtain, and thus should be ignored to allow less known influencers, such as the team, driver, and location, to be considered.
Click Create New Story to restart the process after following the preceding recommendation. Note that Einstein always keeps your past stories.

The following screenshot shows one of the first observations discovered from the story:

Racing drivers Hamilton, Vettel, and Verstappen are among the top drivers in the sport at the time of publication of this book. Einstein Discovery has correctly recognized that they have thrived in scoring the most points when driving in the Mercedes and Red Bull racing teams. As a fan of the sport, I can confirm that this deduction is correct!

The following screenshot shows comparative observations that can be made:

The preceding observation is also very accurate as it indeed confirms the fact that Mercedes has been outperforming Ferrari for the last several years (at the time of publication), since Hamilton, Roseberg, and Bottas have been driving in the Mercedes team.

Both Einstein Prediction Builder and Einstein Discovery products are powerful tools for customers wishing to apply AI to your application data using their historic data. However, there are times when you need to add more AI to specific user experiences, or simply just have more control over the data that is used to drive the AI algorithms, especially when your customers have very little of their own initially. The programmatic APIs of Einstein Platform Services help provide a more custom AI experience packaged into your application through your Apex code calling specific APIs, which we will see in the next section.

Table of Contents for Discovering insights from Formula 1 race results

Create new playlist

Sign In

Sign Up

Table of Contents for
Discovering insights from Formula 1 race results