The ETL blueprint

AWS Glue allows you to implement an extract, transform, and load process from start to finish. This pattern is a common methodology for processing large amounts of data. In the coming sections, we're going to step through an example using sample weather data.

To get an understanding about what we are about to build from a high level, study the following diagram:

ETL process showing components and high-level data flows

As you can see in the blueprint diagram, our data source will be a filesystem that holds the weather data files to be processed. In this case, the filesystem is your local machine.

You can download the sample data files from the following public S3 bucket: s3://weather-data-inbox/raw/.

This is a JSON dataset that's curated from Open Weather. If the preceding S3 location is unavailable, you can always find your own dataset by searching for JSON open weather data.

Once you have downloaded the dataset, we need to place the files in a location that has been allocated for storing our raw (pre-transformed and pre-curated) data. S3 is a great storage system for data objects, so let's move on to the next section to find out the best way to get the data into our raw bucket. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.133.61