Preparing data

After we determined a machine learning model, we can prepare the data. In this project, I generated random values for temperature and humidity. You can see this data in the following graph:

We should make the data in a CSV file. We build three columns—Temperature, Humidity, and watering. You can see these data in the following screenshot. The watering column is used for the target decision for each row's data:

Save the data into the CSV file, for instance, Temp-Hum-Water.csv.

We also need to create a data schema. You should create a schema file in <data-file_name>.schema. For our case, we create a file, Temp-Hum-Water.csv.schema. You can write these scripts for our schema:

{
 "version": "1.0", 
 "targetAttributeName": "Watering",
 "dataFormat": "CSV",
 "dataFileContainsHeader": true,
 "attributes": [
    {
      "attributeName": "Temperature",
      "attributeType": "NUMERIC"
    },
    {
      "attributeName": "Humidity",
      "attributeType": "NUMERIC"
   },
   {
     "attributeName": "Watering",
     "attributeType": "Categorical"
   } 
 ]
}

The next step is to upload the data and schema files into Amazon S3. Currently, Amazon Machine Learning can work with data from Amazon S3 and Amazon Redshift. For demo, we use Amazon S3 to store data and schema files.

Now you can upload the Temp-Hum-Water.csv and Temp-Hum-Water.csv.schema files into Amazon S3. You can see my data in the following screenshot:

We will build a machine learning model from Amazon Machine Learning in the next section.

Table of Contents for Preparing data

Create new playlist

Sign In

Sign Up

Table of Contents for
Preparing data