A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage. Which AWS service strategy is best for this use case? A. Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning. B. Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step. C. Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index. D. Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.  Suggested Answer: C Reference: https://aws.amazon.com/blogs/database/indexing-metadata-in-amazon-elasticsearch-service- using-aws-lambda-and-python/ This question is in BDS-C00 AWS Certified Big Data – Specialty Exam For getting AWS Certified Big Data – Specialty Certificate Disclaimers: The website is not related to, affiliated with, endorsed or authorized by Amazon. Trademarks, certification & product names are used for reference only and belong to Amazon. The website does not contain actual questions and answers from Amazon's Certification Exam.
Please login or Register to submit your answer