A city has been collecting data on its public bicycle share program for the past three years. The 5PB dataset currently resides on Amazon S3. The data contains the following datapoints: ✑ Bicycle origination points ✑ Bicycle destination points ✑ Mileage between the points ✑ Number of bicycle slots available at the station (which is variable based on the station location) ✑ Number of slots available and taken at a…

QuestionsCategory: BDS-C00A city has been collecting data on its public bicycle share program for the past three years. The 5PB dataset currently resides on Amazon S3. The data contains the following datapoints: ✑ Bicycle origination points ✑ Bicycle destination points ✑ Mileage between the points ✑ Number of bicycle slots available at the station (which is variable based on the station location) ✑ Number of slots available and taken at a…
Admin Staff asked 7 months ago
A city has been collecting data on its public bicycle share program for the past three years. The 5PB dataset currently resides on Amazon S3. The data contains the following datapoints:
✑ Bicycle origination points
✑ Bicycle destination points
✑ Mileage between the points
✑ Number of bicycle slots available at the station (which is variable based on the station location)
✑ Number of slots available and taken at a given time
The program has received additional funds to increase the number of bicycle stations available. All data is regularly archived to Amazon Glacier.
The new bicycle stations must be located to provide the most riders access to bicycles.
How should this task be performed?

A. Move the data from Amazon S3 into Amazon EBS-backed volumes and use an EC-2 based Hadoop cluster with spot instances to run a Spark job that performs a stochastic gradient descent optimization.

B. Use the Amazon Redshift COPY command to move the data from Amazon S3 into Redshift and perform a SQL query that outputs the most popular bicycle stations.

C. Persist the data on Amazon S3 and use a transient EMR cluster with spot instances to run a Spark streaming job that will move the data into Amazon Kinesis.

D. Keep the data on Amazon S3 and use an Amazon EMR-based Hadoop cluster with spot instances to run a Spark job that performs a stochastic gradient descent optimization over EMRFS.








 

Suggested Answer: B






This question is in BDS-C00 AWS Certified Big Data – Specialty Exam
For getting AWS Certified Big Data – Specialty Certificate



Disclaimers:
The website is not related to, affiliated with, endorsed or authorized by Amazon.
Trademarks, certification & product names are used for reference only and belong to Amazon.
The website does not contain actual questions and answers from Amazon's Certification Exam.
Question Tags:

Next Post

Recommended

Welcome Back!

Login to your account below

Create New Account!

Fill the forms below to register

Retrieve your password

Please enter your username or email address to reset your password.