A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage. Which AWS service strategy is best for this use case?

Questions › Category: BDS-C00 › A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage. Which AWS service strategy is best for this use case?

0 Vote Up Vote Down

Admin Staff asked 7 months ago

A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage.
Which AWS service strategy is best for this use case?

A. Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning.

B. Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step.

C. Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index.

D. Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.








 

Suggested Answer: C



Reference: https://aws.amazon.com/blogs/database/indexing-metadata-in-amazon-elasticsearch-service-
 using-aws-lambda-and-python/


This question is in BDS-C00 AWS Certified Big Data – Specialty Exam
For getting AWS Certified Big Data – Specialty Certificate



Disclaimers:
The website is not related to, affiliated with, endorsed or authorized by Amazon.
Trademarks, certification & product names are used for reference only and belong to Amazon.
The website does not contain actual questions and answers from Amazon's Certification Exam.

An administrator needs to design a strategy for the schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema. In which two circumstances would choosing EVEN distribution be most appropriate? (Choose two.)

101 Practice Test Free

101-500 Practice Test Free

102-500 Practice Test Free

Recommended

DP-100 Practice Test Free

XK0-005 Practice Test Free

XK0-004 Practice Test Free

SY0-701 Practice Test Free

SY0-601 Practice Test Free

SY0-501 Practice Test Free

Welcome Back!

Create New Account!

Retrieve your password