Questions

Given a data processing workload that extracts data from an on-premise ERP (source) and loads into S3 (target), what is the LEAST cost-effective type of data processing—assuming business analysts don't require real-time analysis since they access the financial reports once a day?

A) Batch
B) Micro-batch
C) Streaming

One of the situations where Glue custom classifiers are required is when data sources contain nested data structures. When data structures are flat on the source, the default classifiers are sufficient for standard data ingestion processing.

A) True
B) False

When is it necessary to run a crawler?

A) When the schema of the crawled dataset has changed
B) When data was added to a previously crawled dataset but the schema hasn't changed
C) When the dataset was moved to a different S3 bucket
D) Both A and B

Which file format is best to use to optimize query performance in Athena?

A) CSV
B) JSON
C) Parquet
D) DOCX

When authoring a custom Glue script, which native Glue classes can be used to help with the transformation tasks that need to be applied on a dataset? (Choose three)

A) ApplyMapping
B) Relationalize
C) Deduplicate
D) ResolveChoice

To run Glue workloads efficiently from a cost and performance perspective, some of the variables that need to be evaluated are as follows: (Choose two)

A) The maximum number of CPUs allocated to a job
B) The number of parameters a job requires
C) The maximum number of DPUs allocated to a job
D) How long the job takes to run, which is highly impacted by the volume of the data to be processed
E) The definition of the micro-batch threshold in records or megabytes

Table of Contents for Questions

Create new playlist

Sign In

Sign Up

Table of Contents for
Questions