HOTSPOT - You develop a dataset named DBTBL1 by using Azure Databricks. DBTBL1 contains the following columns: ✑ SensorTypeID ✑ GeographyRegionID ✑ Year ✑ Month ✑ Day ✑ Hour ✑ Minute ✑ Temperature ✑ WindSpeed ✑ Other You need to store the data to support daily incremental load pipelines that vary for each GeographyRegionID. The solution must minimize storage costs. How should you complete the code? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Suggested Answer: Box 1: .partitionBy - Incorrect Answers: ✑ .format: Method: format(): Arguments: "parquet", "csv", "txt", "json", "jdbc", "orc", "avro", etc. ✑ .bucketBy: Method: bucketBy() Arguments: (numBuckets, col, col..., coln) The number of buckets and names of columns to bucket by. Uses Hive's bucketing scheme on a filesystem. Box 2: ("Year", "Month", "Day","GeographyRegionID") Specify the columns on which to do the partition. Use the date columns followed by the GeographyRegionID column. Box 3: .saveAsTable("/DBTBL1") Method: saveAsTable() Argument: "table_name" The table to save to. Reference: https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/ch04.html https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch This question is in DP-203 Data Engineering on Microsoft Azure Exam For getting Microsoft Certified: Azure Data Engineer Associate Certificate Disclaimers: The website is not related to, affiliated with, endorsed or authorized by Microsoft. The website does not contain actual questions and answers from Microsoft's Certification Exams. Trademarks, certification & product names are used for reference only and belong to Microsoft.
Please login or Register to submit your answer