A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena. The dataset contains more than 800,000 records stored as plaintext CSV files. Each record contains 200 columns and is approximately 1.5 MB in size. Most queries will span 5 to 10 columns only. How should the Machine Learning Specialist transform the dataset to minimize query runtime?

Questions › Category: MLS-C01 › A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena. The dataset contains more than 800,000 records stored as plaintext CSV files. Each record contains 200 columns and is approximately 1.5 MB in size. Most queries will span 5 to 10 columns only. How should the Machine Learning Specialist transform the dataset to minimize query runtime?

0 Vote Up Vote Down

Admin Staff asked 6 months ago

A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena. The dataset contains more than 800,000 records stored as plaintext CSV files. Each record contains 200 columns and is approximately 1.5 MB in size. Most queries will span 5 to 10 columns only.
How should the Machine Learning Specialist transform the dataset to minimize query runtime?

A. Convert the records to Apache Parquet format.

B. Convert the records to JSON format.

C. Convert the records to GZIP CSV format.

D. Convert the records to XML format.








 

Suggested Answer: A

Community Answer: A

Using compressions will reduce the amount of data scanned by Amazon Athena, and also reduce your S3 bucket storage. It's a Win-Win for your AWS bill.
Supported formats: GZIP, LZO, SNAPPY (Parquet) and ZLIB.
Reference:
https://www.cloudforecast.io/blog/using-parquet-on-athena-to-save-money-on-aws/


This question is in MLS-C01 AWS Certified Machine Learning – Specialty Exam
For getting AWS Certified Machine Learning – Specialty Certificate


Disclaimers:
The website is not related to, affiliated with, endorsed or authorized by Amazon.
Trademarks, certification & product names are used for reference only and belong to Amazon.
The website does not contain actual questions and answers from Amazon's Certification Exam.

A company is observing low accuracy while training on the default built-in image classification algorithm in Amazon SageMaker. The Data Science team wants to use an Inception neural network architecture instead of a ResNet architecture. Which of the following will accomplish this? (Choose two.)

AZ-104 Practice Test Free

Recommended

AZ-104 Practice Test Free

Cisco IOS Popular Commands

Linux Professional Institute LPIC-2 Certificate

Linux Professional Institute LPIC-1 Certificate

Cisco Certified DevNet Associate Certificate

AWS Certified Cloud Practitioner Certificate

Welcome Back!

Create New Account!

Retrieve your password

A company is observing low accuracy while training on the default built-in image classification algorithm in Amazon SageMaker. The Data Science team wants to use an Inception neural network architecture instead of a ResNet architecture. Which of the following will accomplish this? (Choose two.)

AZ-104 Practice Test Free

Related Questions

Recommended

AZ-104 Practice Test Free

Cisco IOS Popular Commands

Linux Professional Institute LPIC-2 Certificate

Linux Professional Institute LPIC-1 Certificate

Cisco Certified DevNet Associate Certificate

AWS Certified Cloud Practitioner Certificate

Welcome Back!

Create New Account!

Retrieve your password