Winter Sale- Special Discount Limited Time 65% Offer - Ends in 0d 00h 00m 00s - Coupon code: netdisc

Microsoft DP-203 Data Engineering on Microsoft Azure Exam Practice Test

Page: 1 / 34
Total 341 questions

Data Engineering on Microsoft Azure Questions and Answers

Question 1

What should you recommend using to secure sensitive customer contact information?

Options:

A.

data labels

B.

column-level security

C.

row-level security

D.

Transparent Data Encryption (TDE)

Question 2

Which Azure Data Factory components should you recommend using together to import the daily inventory data from the SQL server to Azure Data Lake Storage? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 2

Options:

Question 3

What should you recommend to prevent users outside the Litware on-premises network from accessing the analytical data store?

Options:

A.

a server-level virtual network rule

B.

a database-level virtual network rule

C.

a database-level firewall IP rule

D.

a server-level firewall IP rule

Question 4

What should you do to improve high availability of the real-time data processing solution?

Options:

A.

Deploy identical Azure Stream Analytics jobs to paired regions in Azure.

B.

Deploy a High Concurrency Databricks cluster.

C.

Deploy an Azure Stream Analytics job and use an Azure Automation runbook to check the status of the job and to start the job if it stops.

D.

Set Data Lake Storage to use geo-redundant storage (GRS).

Question 5

You have an Azure Databricks workspace that contains a Delta Lake dimension table named Tablet. Table1 is a Type 2 slowly changing dimension (SCD) table. You need to apply updates from a source table to Table1. Which Apache Spark SQL operation should you use?

Options:

A.

CREATE

B.

UPDATE

C.

MERGE

D.

ALTER

Question 6

You have an Azure Data Lake Storage Gen2 container.

Data is ingested into the container, and then transformed by a data integration application. The data is NOT modified after that. Users can read files in the container but cannot modify the files.

You need to design a data archiving solution that meets the following requirements:

    New data is accessed frequently and must be available as quickly as possible.

    Data that is older than five years is accessed infrequently but must be available within one second when requested.

    Data that is older than seven years is NOT accessed. After seven years, the data must be persisted at the lowest cost possible.

    Costs must be minimized while maintaining the required availability.

How should you manage the data? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point

Question # 6

Options:

Question 7

You are designing an Azure Databricks cluster that runs user-defined local processes. You need to recommend a cluster configuration that meets the following requirements:

• Minimize query latency.

• Maximize the number of users that can run queues on the cluster at the same time « Reduce overall costs without compromising other requirements

Which cluster type should you recommend?

Options:

A.

Standard with Auto termination

B.

Standard with Autoscaling

C.

High Concurrency with Autoscaling

D.

High Concurrency with Auto Termination

Question 8

You have an Azure data factory that has the Git repository settings shown in the following exhibit.

Question # 8

Use the drop-down menus to select the answer choose that completes each statement based on the information presented in the graphic.

NOTE: Each correct answer is worth one point.

Question # 8

Options:

Question 9

You are monitoring an Azure Stream Analytics job by using metrics in Azure.

You discover that during the last 12 hours, the average watermark delay is consistently greater than the configured late arrival tolerance.

What is a possible cause of this behavior?

Options:

A.

Events whose application timestamp is earlier than their arrival time by more than five minutes arrive as inputs.

B.

There are errors in the input data.

C.

The late arrival policy causes events to be dropped.

D.

The job lacks the resources to process the volume of incoming data.

Question 10

You have an Azure subscription.

You need to deploy an Azure Data Lake Storage Gen2 Premium account. The solution must meet the following requirements:

• Blobs that are older than 365 days must be deleted.

• Administrator efforts must be minimized.

• Costs must be minimized

What should you use? To answer, select the appropriate options in the answer area. NOTE Each correct selection is worth one point.

Question # 10

Options:

Question 11

You plan to create an Azure Data Factory pipeline that will include a mapping data flow.

You have JSON data containing objects that have nested arrays.

You need to transform the JSON-formatted data into a tabular dataset. The dataset must have one tow for each item in the arrays.

Which transformation method should you use in the mapping data flow?

Options:

A.

unpivot

B.

flatten

C.

new branch

D.

alter row

Question 12

You need to output files from Azure Data Factory.

Which file format should you use for each type of output? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 12

Options:

Question 13

You are designing a folder structure for the files m an Azure Data Lake Storage Gen2 account. The account has one container that contains three years of data.

You need to recommend a folder structure that meets the following requirements:

• Supports partition elimination for queries by Azure Synapse Analytics serverless SQL pooh

• Supports fast data retrieval for data from the current month

• Simplifies data security management by department

Which folder structure should you recommend?

Options:

A.

\YYY\MM\DD\Department\DataSource\DataFile_YYYMMMDD.parquet

B.

\Depdftment\DataSource\YYY\MM\DataFile_YYYYMMDD.parquet

C.

\DD\MM\YYYY\Department\DataSource\DataFile_DDMMYY.parquet

D.

\DataSource\Department\YYYYMM\DataFile_YYYYMMDD.parquet

Question 14

You have an Azure Synapse Analytics SQL pool named Pool1 on a logical Microsoft SQL server named Server1.

You need to implement Transparent Data Encryption (TDE) on Pool1 by using a custom key named key1.

Which five actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Question # 14

Options:

Question 15

You have an Azure subscription that contains the following resources:

    An Azure Active Directory (Azure AD) tenant that contains a security group named Group1

    An Azure Synapse Analytics SQL pool named Pool1

You need to control the access of Group1 to specific columns and rows in a table in Pool1.

Which Transact-SQL commands should you use? To answer, select the appropriate options in the answer area.

Question # 15

Options:

Question 16

You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.

You need to create the table to meet the following requirements:

• Provide the fastest Query time.

• Minimize data movement during queries.

Which type of table should you use?

Options:

A.

hash distributed

B.

heap

C.

replicated

D.

round-robin

Question 17

You have an Azure Synapse Analytics dedicated SQL pool mat contains a table named dbo.Users.

You need to prevent a group of users from reading user email addresses from dbo.Users. What should you use?

Options:

A.

row-level security

B.

column-level security

C.

Dynamic data masking

D.

Transparent Data Encryption (TDD

Question 18

You develop data engineering solutions for a company.

A project requires the deployment of data to Azure Data Lake Storage.

You need to implement role-based access control (RBAC) so that project members can manage the Azure Data Lake Storage resources.

Which three actions should you perform? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

A.

Assign Azure AD security groups to Azure Data Lake Storage.

B.

Configure end-user authentication for the Azure Data Lake Storage account.

C.

Configure service-to-service authentication for the Azure Data Lake Storage account.

D.

Create security groups in Azure Active Directory (Azure AD) and add project members.

E.

Configure access control lists (ACL) for the Azure Data Lake Storage account.

Question 19

You have an Azure Data Lake Storage Gen2 account that contains two folders named Folder and Folder2.

You use Azure Data Factory to copy multiple files from Folder1 to Folder2.

Question # 19

You receive the following error.

What should you do to resolve the error.

Options:

A.

Add an explicit mapping.

B.

Enable fault tolerance to skip incompatible rows.

C.

Lower the degree of copy parallelism

D.

Change the Copy activity setting to Binary Copy

Question 20

You have an Azure Stream Analytics job.

You need to ensure that the jo b has enough streaming units provisioned.

You configure monitoring of the SU % Utilization metric.

Which two additional metrics should you monitor? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

A.

Backlogged Input Events

B.

Watermark Delay

C.

Function Events

D.

Out of order Events

E.

Late Input Events

Question 21

You have an Azure Synapse Analytics dedicated SQL pool named SA1 that contains a table named Table1. You need to identify tables that have a high percentage of deleted rows. What should you run?

A)

Question # 21

B)

Question # 21

C)

Question # 21

D)

Question # 21

Options:

A.

Option

B.

Option

C.

Option

D.

Option

Question 22

You have an Azure Data Lake Storage Gen2 account named account1 that stores logs as shown in the following table.

Question # 22

You do not expect that the logs will be accessed during the retention periods.

You need to recommend a solution for account1 that meets the following requirements:

    Automatically deletes the logs at the end of each retention period

    Minimizes storage costs

What should you include in the recommendation? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 22

Options:

Question 23

You have an activity in an Azure Data Factory pipeline. The activity calls a stored procedure in a data warehouse in Azure Synapse Analytics and runs daily.

You need to verify the duration of the activity when it ran last.

What should you use?

Options:

A.

activity runs in Azure Monitor

B.

Activity log in Azure Synapse Analytics

C.

the sys.dm_pdw_wait_stats data management view in Azure Synapse Analytics

D.

an Azure Resource Manager template

Question 24

You are building an Azure Stream Analytics job to identify how much time a user spends interacting with a feature on a webpage.

The job receives events based on user actions on the webpage. Each row of data represents an event. Each event has a type of either 'start' or 'end'.

You need to calculate the duration between start and end events.

How should you complete the query? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 24

Options:

Question 25

You have the following Azure Stream Analytics query.

Question # 25

For each of the following statements, select Yes if the statement is true. Otherwise, select No.

NOTE: Each correct selection is worth one point.

Question # 25

Options:

Question 26

You have an Azure subscription that contains an Azure data factory named ADF1.

From Azure Data Factory Studio, you build a complex data pipeline in ADF1.

You discover that the Save button is unavailable and there are validation errors that prevent the pipeline from being published.

You need to ensure that you can save the logic of the pipeline.

Solution: You export ADF1 as an Azure Resource Manager (ARM) template.

Options:

A.

Yes

B.

No

Question 27

You have an Azure data factor/ connected to a Git repository that contains the following branches:

• mam: Collaboration branch

• abc: Feature branch

• xyz: Feature branch

You save charges to a pipeline in the xyz branch.

You need to publish the changes to the live service

What should you do first?

Options:

A.

Push the code to a remote origin.

B.

Publish the data factory.

C.

Create a pull request to merge the changes into the abc branch.

D.

Create a pull request to merge the changes into the main branch.

Question 28

You have an enterprise data warehouse in Azure Synapse Analytics named DW1 on a server named Server1.

You need to determine the size of the transaction log file for each distribution of DW1.

What should you do?

Options:

A.

On DW1, execute a query against the sys.database_files dynamic management view.

B.

From Azure Monitor in the Azure portal, execute a query against the logs of DW1.

C.

Execute a query against the logs of DW1 by using the

Get-AzOperationalInsightsSearchResult PowerShell cmdlet.

D.

On the master database, execute a query against the

sys.dm_pdw_nodes_os_performance_counters dynamic management view.

Question 29

You have an Azure Synapse Analytics dedicated SQL pool.

You need to Create a fact table named Table1 that will store sales data from the last three years. The solution must be optimized for the following query operations:

Show order counts by week.

• Calculate sales totals by region.

• Calculate sales totals by product.

• Find all the orders from a given month.

Which data should you use to partition Table1?

Options:

A.

region

B.

product

C.

week

D.

month

Question 30

You are creating an Apache Spark job in Azure Databricks that will ingest JSON-formatted data.

You need to convert a nested JSON string into a DataFrame that will contain multiple rows.

Which Spark SQL function should you use?

Options:

A.

explode

B.

filter

C.

coalesce

D.

extract

Question 31

You have an Azure subscription that contains the resources shown in the following table.

Question # 31

Diagnostic logs from ADF1 are sent to LA1. ADF1 contains a pipeline named Pipeline that copies data (torn DB1 to Dwl. You need to perform the following actions:

• Create an action group named AG1.

• Configure an alert in ADF1 to use AG1.

In which resource group should you create AG1?

Options:

A.

RG1

B.

RG2

C.

RG3

D.

RG4

Question 32

You have an Azure Data Factory pipeline that performs an incremental load of source data to an Azure Data Lake Storage Gen2 account.

Data to be loaded is identified by a column named LastUpdatedDate in the source table.

You plan to execute the pipeline every four hours.

You need to ensure that the pipeline execution meets the following requirements:

    Automatically retries the execution when the pipeline run fails due to concurrency or throttling limits.

    Supports backfilling existing data in the table.

Which type of trigger should you use?

Options:

A.

Storage event

B.

on-demand

C.

schedule

D.

tumbling window

Question 33

You use PySpark in Azure Databricks to parse the following JSON input.

Question # 33

You need to output the data in the following tabular format.

Question # 33

How should you complete the PySpark code? To answer, drag the appropriate values to he correct targets. Each value may be used once, more than once or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

Question # 33

Options:

Question 34

You have an Azure Stream Analytics job that is a Stream Analytics project solution in Microsoft Visual Studio. The job accepts data generated by IoT devices in the JSON format.

You need to modify the job to accept data generated by the IoT devices in the Protobuf format.

Which three actions should you perform from Visual Studio on sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Question # 34

Options:

Question 35

You need to design an analytical storage solution for the transactional data. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 35

Options:

Question 36

You need to implement versioned changes to the integration pipelines. The solution must meet the data integration requirements.

In which order should you perform the actions? To answer, move all actions from the list of actions to the answer area and arrange them in the correct order.

Question # 36

Options:

Question 37

You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The solution must meet the customer sentiment analytics requirements.

Which three Transaction-SQL DDL commands should you run in sequence? To answer, move the appropriate commands from the list of commands to the answer area and arrange them in the correct order.

NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.

Question # 37

Options:

Question 38

You need to design a data ingestion and storage solution for the Twitter feeds. The solution must meet the customer sentiment analytics requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area

NOTE: Each correct selection b worth one point.

Question # 38

Options:

Question 39

You need to implement the surrogate key for the retail store table. The solution must meet the sales transaction

dataset requirements.

What should you create?

Options:

A.

a table that has an IDENTITY property

B.

a system-versioned temporal table

C.

a user-defined SEQUENCE object

D.

a table that has a FOREIGN KEY constraint

Question 40

You need to design a data retention solution for the Twitter feed data records. The solution must meet the customer sentiment analytics requirements.

Which Azure Storage functionality should you include in the solution?

Options:

A.

change feed

B.

soft delete

C.

time-based retention

D.

lifecycle management

Question 41

You need to design the partitions for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 41

Options:

Question 42

You need to design a data storage structure for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 42

Options:

Question 43

You need to integrate the on-premises data sources and Azure Synapse Analytics. The solution must meet the data integration requirements.

Which type of integration runtime should you use?

Options:

A.

Azure-SSIS integration runtime

B.

self-hosted integration runtime

C.

Azure integration runtime

Question 44

You need to implement an Azure Synapse Analytics database object for storing the sales transactions data. The solution must meet the sales transaction dataset requirements.

What solution must meet the sales transaction dataset requirements.

What should you do? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 44

Options:

Question 45

You need to design a data retention solution for the Twitter teed data records. The solution must meet the customer sentiment analytics requirements.

Which Azure Storage functionality should you include in the solution?

Options:

A.

time-based retention

B.

change feed

C.

soft delete

D.

Iifecycle management

Page: 1 / 34
Total 341 questions