A company has developed an Apache Hive script to batch process data stared in Amazon S3. The script needs to run once every day and store the output in Amazon S3. The company tested the script, and it completes within 30 minutes on a small local three-node cluster. Which solution is the MOST cost-effective for scheduling and executing the script?

Questions › Category: DAS-C01 › A company has developed an Apache Hive script to batch process data stared in Amazon S3. The script needs to run once every day and store the output in Amazon S3. The company tested the script, and it completes within 30 minutes on a small local three-node cluster. Which solution is the MOST cost-effective for scheduling and executing the script?

0 Vote Up Vote Down

Admin Staff asked 7 months ago

A company has developed an Apache Hive script to batch process data stared in Amazon S3. The script needs to run once every day and store the output in
Amazon S3. The company tested the script, and it completes within 30 minutes on a small local three-node cluster.
Which solution is the MOST cost-effective for scheduling and executing the script?

A. Create an AWS Lambda function to spin up an Amazon EMR cluster with a Hive execution step. Set KeepJobFlowAliveWhenNoSteps to false and disable the termination protection flag. Use Amazon CloudWatch Events to schedule the Lambda function to run daily.

B. Use the AWS Management Console to spin up an Amazon EMR cluster with Python Hue. Hive, and Apache Oozie. Set the termination protection flag to true and use Spot Instances for the core nodes of the cluster. Configure an Oozie workflow in the cluster to invoke the Hive script daily.

C. Create an AWS Glue job with the Hive script to perform the batch operation. Configure the job to run once a day using a time-based schedule.

D. Use AWS Lambda layers and load the Hive runtime to AWS Lambda and copy the Hive script. Schedule the Lambda function to run daily by creating a workflow using AWS Step Functions.








 

Suggested Answer: C

Community Answer: A




This question is in DAS-C01 AWS Certified Data Analytics – Specialty Exam
For getting AWS Certified Data Analytics – Specialty Certificate



Disclaimers:
The website is not related to, affiliated with, endorsed or authorized by Amazon.
Trademarks, certification & product names are used for reference only and belong to Amazon.
The website does not contain actual questions and answers from Amazon's Certification Exam.

Once a month, a company receives a 100 MB .csv file compressed with gzip. The file contains 50,000 property listing records and is stored in Amazon S3 Glacier. The company needs its data analyst to query a subset of the data for a specific vendor. What is the most cost-effective solution?

101 Practice Test Free

101-500 Practice Test Free

102-500 Practice Test Free

Recommended

DP-100 Practice Test Free

XK0-005 Practice Test Free

XK0-004 Practice Test Free

SY0-701 Practice Test Free

SY0-601 Practice Test Free

SY0-501 Practice Test Free

Welcome Back!

Create New Account!

Retrieve your password