We can now summarize all facts which we learned about missing values:
- There are 107 rows which are useless and need to be filtered out
- There are 44 rows with 26 or 27 missing values. These rows seem useless, so we are going to filter them out.
- The heart rate column contains the majority of missing values. Since we expect that the column contains important information which can help to distinguish between different sport activities, we are not going to ignore the column. However, we are going to impute the missing value based on different strategies:
- Mean resting heart rate based on medical research
- mean heart rate computed from available data
- There is a pattern in the missing values in the rest of the columns - missing values are strictly linked to a sensor. We replace all these missing values with the value 0.0.