An organization currently runs a large Hadoop environment in their data center and is in the process of creating an alternative Hadoop environment on AWS, using Amazon EMR. They generate around 20 TB of data on a monthly basis. Also on a monthly basis, files need to be grouped and copied to Amazon S3 to be used for the Amazon EMR environment. They have multiple S3 buckets across AWS accounts...

Questions › Category: BDS-C00 › An organization currently runs a large Hadoop environment in their data center and is in the process of creating an alternative Hadoop environment on AWS, using Amazon EMR. They generate around 20 TB of data on a monthly basis. Also on a monthly basis, files need to be grouped and copied to Amazon S3 to be used for the Amazon EMR environment. They have multiple S3 buckets across AWS accounts…

0 Vote Up Vote Down

Admin Staff asked 3 months ago

An organization currently runs a large Hadoop environment in their data center and is in the process of creating an alternative Hadoop environment on AWS, using Amazon EMR.
They generate around 20 TB of data on a monthly basis. Also on a monthly basis, files need to be grouped and copied to Amazon S3 to be used for the Amazon
EMR environment. They have multiple S3 buckets across AWS accounts to which data needs to be copied. There is a 10G AWS Direct Connect setup between their data center and AWS, and the network team has agreed to allocate 50% of AWS Direct Connect bandwidth to data transfer. The data transfer cannot take more than two days.
What would be the MOST efficient approach to transfer data to AWS on a monthly basis?

A. Use an offline copy method, such as an AWS Snowball device, to copy and transfer data to Amazon S3.

B. Configure a multipart upload for Amazon S3 on AWS Java SDK to transfer data over AWS Direct Connect.

C. Use Amazon S3 transfer acceleration capability to transfer data to Amazon S3 over AWS Direct Connect.

D. Setup S3DistCop tool on the on-premises Hadoop environment to transfer data to Amazon S3 over AWS Direct Connect.

An administrator needs to design a strategy for the schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema. In which two circumstances would choosing EVEN distribution be most appropriate? (Choose two.)

Recommended

Cisco IOS Popular Commands

Linux Professional Institute LPIC-2 Certificate

Linux Professional Institute LPIC-1 Certificate

Cisco Certified DevNet Associate Certificate

AWS Certified Cloud Practitioner Certificate

Fortinet NSE 4 – FortiOS 7.2 certificate

Welcome Back!

Create New Account!

Retrieve your password