Questions

  1. Given a data processing workload that extracts data from an on-premise ERP (source) and loads into S3 (target), what is the LEAST cost-effective type of data processing—assuming business analysts don't require real-time analysis since they access the financial reports once a day?

A) Batch
B) Micro-batch
C) Streaming

  1. One of the situations where Glue custom classifiers are required is when data sources contain nested data structures. When data structures are flat on the source, the default classifiers are sufficient for standard data ingestion processing.

A) True
B) False

  1. When is it necessary to run a crawler?

A) When the schema of the crawled dataset has changed
B) When data was added to a previously crawled dataset but the schema hasn't changed
C) When the dataset was moved to a different S3 bucket
D) Both A and B

  1. Which file format is best to use to optimize query performance in Athena?

A) CSV
B) JSON
C) Parquet
D) DOCX

  1. When authoring a custom Glue script, which native Glue classes can be used to help with the transformation tasks that need to be applied on a dataset? (Choose three)

A) ApplyMapping
B) Relationalize
C) Deduplicate
D) ResolveChoice

  1. To run Glue workloads efficiently from a cost and performance perspective, some of the variables that need to be evaluated are as follows: (Choose two)

A) The maximum number of CPUs allocated to a job
B) The number of parameters a job requires
C) The maximum number of DPUs allocated to a job
D) How long the job takes to run, which is highly impacted by the volume of the data to be processed
E) The definition of the micro-batch threshold in records or megabytes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.20.90