You need to develop a pipeline for processing data. The pipeline must meet the following requirements: ✑ Scale up and down resources for cost reduction ✑ Use an in-memory data processing engine to speed up ETL and machine learning operations. ✑ Use streaming capabilities ✑ Provide the ability to code in SQL, Python, Scala, and R Integrate workspace collaboration with Git What should you use? A. HDInsight Spark Cluster B. Azure Stream Analytics C. HDInsight Hadoop Cluster D. Azure SQL Data Warehouse E. HDInsight Kafka Cluster F. HDInsight Storm Cluster Suggested Answer: A Aparch Spark is an open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications. HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch processing, you can use Spark, Hive, Hive LLAP, MapReduce. Languages: R, Python, Java, Scala, SQL You can create an HDInsight Spark cluster using an Azure Resource Manager template. The template can be found in GitHub. References: https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing This question is in DP-200 Microsoft Azure Data Engineer Exam For getting Microsoft Certified: Azure Data Engineer Associate Certificate Disclaimers: The website is not related to, affiliated with, endorsed or authorized by Microsoft. The website does not contain actual questions and answers from Microsoft's Certification Exams. Trademarks, certification & product names are used for reference only and belong to Microsoft.
Please login or Register to submit your answer