New Year Special Limited Time Flat 70% Discount offer - Ends in 0d 00h 00m 00s - Coupon code: 70spcl

Microsoft DP-203 Data Engineering on Microsoft Azure Exam Practice Test

Page: 1 / 35
Total 347 questions

Data Engineering on Microsoft Azure Questions and Answers

Question 1

You have an Azure subscription that contains the resources shown in the following table.

Question # 1

You need to ingest the Parquet files from storage1 to SQL1 by using pipeline1. The solution must meet the following requirements:

• Minimize complexity.

• Ensure that additional columns in the files are processed as strings.

• Ensure that files containing additional columns are processed successfully.

How should you configure pipeline1? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 1

Options:

Question 2

You are designing an Azure Synapse solution that will provide a query interface for the data stored in an Azure Storage account. The storage account is only accessible from a virtual network.

You need to recommend an authentication mechanism to ensure that the solution can access the source data.

What should you recommend?

Options:

A.

a managed identity

B.

anonymous public read access

C.

a shared key

Question 3

You have an Azure Synapse Analytics workspace that contains three pipelines and three triggers named Trigger 1. Trigger2, and Tiigger3.

Trigger 3 has the following definition.

Question # 3

Question # 3

Options:

Question 4

You plan to develop a dataset named Purchases by using Azure databricks Purchases will contain the following columns:

• ProductID

• ItemPrice

• lineTotal

• Quantity

• StorelD

• Minute

• Month

• Hour

• Year

• Day

You need to store the data to support hourly incremental load pipelines that will vary for each StoreID. the solution must minimize storage costs. How should you complete the rode? To answer, select the appropriate options In the answer area.

NOTE: Each correct selection is worth one point.

Question # 4

Options:

Question 5

You are planning the deployment of Azure Data Lake Storage Gen2.

You have the following two reports that will access the data lake:

    Report1: Reads three columns from a file that contains 50 columns.

    Report2: Queries a single record based on a timestamp.

You need to recommend in which format to store the data in the data lake to support the reports. The solution must minimize read times.

What should you recommend for each report? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 5

Options:

Question 6

You implement an enterprise data warehouse in Azure Synapse Analytics.

You have a large fact table that is 10 terabytes (TB) in size.

Incoming queries use the primary key SaleKey column to retrieve data as displayed in the following table:

Question # 6

You need to distribute the large fact table across multiple nodes to optimize performance of the table.

Which technology should you use?

Options:

A.

hash distributed table with clustered index

B.

hash distributed table with clustered Columnstore index

C.

round robin distributed table with clustered index

D.

round robin distributed table with clustered Columnstore index

E.

heap table with distribution replicate

Question 7

You have an Azure subscription that contains an Azure Cosmos DB database. Azure Synapse Link is implemented on the database. You configure a full fidelity schema for the analytical store. You perform the following actions:

Question # 7

How many columns will the analytical store contain?

Options:

A.

1

B.

2

C.

3

D.

4

Question 8

You are designing a real-time dashboard solution that will visualize streaming data from remote sensors that connect to the internet. The streaming data must be aggregated to show the average value of each 10-second interval. The data will be discarded after being displayed in the dashboard.

The solution will use Azure Stream Analytics and must meet the following requirements:

    Minimize latency from an Azure Event hub to the dashboard.

    Minimize the required storage.

    Minimize development effort.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point

Question # 8

Options:

Question 9

You have an Azure data factory.

You need to examine the pipeline failures from the last 180 flays.

What should you use?

Options:

A.

the Activity tog blade for the Data Factory resource

B.

Azure Data Factory activity runs in Azure Monitor

C.

Pipeline runs in the Azure Data Factory user experience

D.

the Resource health blade for the Data Factory resource

Question 10

You have several Azure Data Factory pipelines that contain a mix of the following types of activities.

* Wrangling data flow

* Notebook

* Copy

* jar

Which two Azure services should you use to debug the activities? Each correct answer presents part of the solution NOTE: Each correct selection is worth one point.

Options:

A.

Azure HDInsight

B.

Azure Databricks

C.

Azure Machine Learning

D.

Azure Data Factory

E.

Azure Synapse Analytics

Question 11

You are designing an Azure Stream Analytics solution that receives instant messaging data from an Azure Event Hub.

You need to ensure that the output from the Stream Analytics job counts the number of messages per time zone every 15 seconds.

How should you complete the Stream Analytics query? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 11

Options:

Question 12

You have an Azure Stream Analytics job that read data from an Azure event hub.

You need to evaluate whether the job processes data as quickly as the data arrives or cannot keep up.

Which metric should you review?

Options:

A.

InputEventLastPunctuationTime

B.

Input Sources Receive

C.

Late input Events

D.

Backlogged input Events

Question 13

Vou have an Azure Data factory pipeline that has the logic flow shown in the following exhibit.

Question # 13

For each of the following statements, select Yes if the statement is true. Otherwise, select No.

NOTE: Each coned selection is worth one point.

Question # 13

Options:

Question 14

You are developing an application that uses Azure Data Lake Storage Gen 2.

You need to recommend a solution to grant permissions to a specific application for a limited time period.

What should you include in the recommendation?

Options:

A.

Azure Active Directory (Azure AD) identities

B.

shared access signatures (SAS)

C.

account keys

D.

role assignments

Question 15

You have an Azure Synapse workspace named MyWorkspace that contains an Apache Spark database named mytestdb.

You run the following command in an Azure Synapse Analytics Spark pool in MyWorkspace.

CREATE TABLE mytestdb.myParquetTable(

EmployeeID int,

EmployeeName string,

EmployeeStartDate date)

USING Parquet

You then use Spark to insert a row into mytestdb.myParquetTable. The row contains the following data.

Question # 15

One minute later, you execute the following query from a serverless SQL pool in MyWorkspace.

SELECT EmployeeID

FROM mytestdb.dbo.myParquetTable

WHERE name = 'Alice';

What will be returned by the query?

Options:

A.

24

B.

an error

C.

a null value

Question 16

You have an Azure Storage account and a data warehouse in Azure Synapse Analytics in the UK South region.

You need to copy blob data from the storage account to the data warehouse by using Azure Data Factory. The solution must meet the following requirements:

    Ensure that the data remains in the UK South region at all times.

    Minimize administrative effort.

Which type of integration runtime should you use?

Options:

A.

Azure integration runtime

B.

Azure-SSIS integration runtime

C.

Self-hosted integration runtime

Question 17

You have an Azure Data Factory version 2 (V2) resource named Df1. Df1 contains a linked service.

You have an Azure Key vault named vault1 that contains an encryption key named key1.

You need to encrypt Df1 by using key1.

What should you do first?

Options:

A.

Add a private endpoint connection to vault 1.

B.

Enable Azure role-based access control on vault 1.

C.

Remove the linked service from Df1.

D.

Create a self-hosted integration runtime.

Question 18

You have Azure Data Factory configured with Azure Repos Git integration. The collaboration branch and the publish branch are set to the default values.

You have a pipeline named pipeline 1.

You build a new version of pipeline1 in a branch named feature 1.

From the Data Factory Studio, you select Publish

The source code of which branch will be built, and which branch will contain the output of the Azure Resource Manager (ARM) template? To answer, select the appropriate options in the answer area.

Question # 18

Options:

Question 19

You have an Azure Blob storage account named storage! and an Azure Synapse Analytics serverless SQL pool named Pool! From Pool1., you plan to run ad-hoc queries that target storage!

You need to ensure that you can use shared access signature (SAS) authorization without defining a data source. What should you create first?

Options:

A.

a stored access policy

B.

a server-level credential

C.

a managed identity

D.

a database scoped credential

Question 20

You have an Azure Databricks workspace and an Azure Data Lake Storage Gen2 account named storage!

New files are uploaded daily to storage1.

• Incrementally process new files as they are upkorage1 as a structured streaming source. The solution must meet the following requirements:

• Minimize implementation and maintenance effort.

• Minimize the cost of processing millions of files.

• Support schema inference and schema drift.

Which should you include in the recommendation?

Options:

A.

Auto Loader

B.

Apache Spark FileStreamSource

C.

COPY INTO

D.

Azure Data Factory

Question 21

You have an Azure data factory named ADM that contains a pipeline named Pipelwe1

Pipeline! must execute every 30 minutes with a 15-minute offset.

Vou need to create a trigger for Pipehne1. The trigger must meet the following requirements:

• Backfill data from the beginning of the day to the current time.

• If Pipeline1 fairs, ensure that the pipeline can re-execute within the same 30-mmute period.

• Ensure that only one concurrent pipeline execution can occur.

• Minimize de4velopment and configuration effort

Which type of trigger should you create?

Options:

A.

schedule

B.

event-based

C.

manual

D.

tumbling window

Question 22

You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day.

You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times.

What should you include in the solution?

Options:

A.

Partition by DateTime fields.

B.

Sink to Azure Queue storage.

C.

Include a watermark column.

D.

Use a JSON format for physical data storage.

Question 23

You have an Azure Databricks workspace named workspace! in the Standard pricing tier. Workspace1 contains an all-purpose cluster named cluster1.

You need to reduce the time it takes for cluster 1 to start and scale up. The solution must minimize costs.

What should you do first?

Options:

A.

Upgrade workspace1 to the Premium pricing tier.

B.

Create a cluster policy in workspace1.

C.

Create a pool in workspace1.

D.

Configure a global init script for workspace1.

Question 24

You have a data warehouse in Azure Synapse Analytics.

You need to ensure that the data in the data warehouse is encrypted at rest.

What should you enable?

Options:

A.

Advanced Data Security for this database

B.

Transparent Data Encryption (TDE)

C.

Secure transfer required

D.

Dynamic Data Masking

Question 25

You have an Azure subscription that contains the resources shown in the following table.

Question # 25

You need to ensure that you can Spark notebooks in ws1. The solution must ensure secrets from kv1 by using UAMI1. What should you do? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 25

Options:

Question 26

You are implementing a star schema in an Azure Synapse Analytics dedicated SQL pool.

You plan to create a table named DimProduct.

DimProduct must be a Type 3 slowly changing dimension (SCO) table that meets the following requirements:

• The values in two columns named ProductKey and ProductSourceID will remain the same.

• The values in three columns named ProductName, ProductDescription, and Color can change.

You need to add additional columns to complete the following table definition.

Question # 26

A)

Question # 26

B)

Question # 26

C)

Question # 26

D)

Question # 26

E)

Question # 26

F)

Question # 26

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

F.

Option F

Question 27

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this scenario, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.

You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.

You need to prepare the files to ensure that the data copies quickly.

Solution: You convert the files to compressed delimited text files.

Does this meet the goal?

Options:

A.

Yes

B.

No

Question 28

You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named Pool1. You have the queries shown in the following table.

Question # 28

You are evaluating whether to enable result set caching for Pool1. Which query results will be cached if result set caching is enabled?

Options:

A.

Query1 only

B.

Query2 only

C.

Query 1 and Query2 only

D.

Query1 and Query3 only

E.

Query 1, Query2, and Query3 only

Question 29

You have a table in an Azure Synapse Analytics dedicated SQL pool. The table was created by using the following Transact-SQL statement.

Question # 29

You need to alter the table to meet the following requirements:

    Ensure that users can identify the current manager of employees.

    Support creating an employee reporting hierarchy for your entire company.

    Provide fast lookup of the managers’ attributes such as name and job title.

Which column should you add to the table?

Options:

A.

[ManagerEmployeeID] [int] NULL

B.

[ManagerEmployeeID] [smallint] NULL

C.

[ManagerEmployeeKey] [int] NULL

D.

[ManagerName] [varchar](200) NULL

Question 30

You have an Azure subscription that is linked to a hybrid Azure Active Directory (Azure AD) tenant. The subscription contains an Azure Synapse Analytics SQL pool named Pool1.

You need to recommend an authentication solution for Pool1. The solution must support multi-factor authentication (MFA) and database-level authentication.

Which authentication solution or solutions should you include in the recommendation? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 30

Options:

Question 31

You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse Analytics dedicated SQL pool.

You plan to keep a record of changes to the available fields.

The supplier data contains the following columns.

Question # 31

Which three additional columns should you add to the data to create a Type 2 SCD? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

A.

surrogate primary key

B.

foreign key

C.

effective start date

D.

effective end date

E.

last modified date

F.

business key

Question 32

You have an Azure Data Factory pipeline named pipeline1 that is invoked by a tumbling window trigger named Trigger1. Trigger1 has a recurrence of 60 minutes.

You need to ensure that pipeline1 will execute only if the previous execution completes successfully.

How should you configure the self-dependency for Trigger1?

Options:

A.

offset: "-00:01:00" size: "00:01:00"

B.

offset: "01:00:00" size: "-01:00:00"

C.

offset: "01:00:00" size: "01:00:00"

D.

offset: "-01:00:00" size: "01:00:00"

Question 33

You have an Azure Data Factory pipeline shown the following exhibit.

Question # 33

The execution log for the first pipeline run is shown in the following exhibit.

Question # 33

The execution log for the second pipeline run is shown in the following exhibit.

Question # 33

For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point.

Question # 33

Options:

Question 34

You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be stored in files in Azure Data Lake Storage, and then consumed by using Azure Datiabricks and PolyBase in Azure Synapse Analytics.

You need to recommend a Stream Analytics data output format to ensure that the queries from Databricks and PolyBase against the files encounter the fewest possible errors. The solution must ensure that the tiles can be queried quickly and that the data type information is retained.

What should you recommend?

Options:

A.

Parquet

B.

Avro

C.

CSV

D.

JSON

Question 35

A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution uses Azure Event Hub to ingest data and an Azure Stream Analytics cloud job to analyze the data. The cloud job is configured to use 120 Streaming Units (SU).

You need to optimize performance for the Azure Stream Analytics job.

Which two actions should you perform? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

A.

Implement event ordering.

B.

Implement Azure Stream Analytics user-defined functions (UDF).

C.

Implement query parallelization by partitioning the data output.

D.

Scale the SU count for the job up.

E.

Scale the SU count for the job down.

F.

Implement query parallelization by partitioning the data input.

Question 36

What should you recommend to prevent users outside the Litware on-premises network from accessing the analytical data store?

Options:

A.

a server-level virtual network rule

B.

a database-level virtual network rule

C.

a database-level firewall IP rule

D.

a server-level firewall IP rule

Question 37

What should you do to improve high availability of the real-time data processing solution?

Options:

A.

Deploy identical Azure Stream Analytics jobs to paired regions in Azure.

B.

Deploy a High Concurrency Databricks cluster.

C.

Deploy an Azure Stream Analytics job and use an Azure Automation runbook to check the status of the job and to start the job if it stops.

D.

Set Data Lake Storage to use geo-redundant storage (GRS).

Question 38

Which Azure Data Factory components should you recommend using together to import the daily inventory data from the SQL server to Azure Data Lake Storage? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 38

Options:

Question 39

What should you recommend using to secure sensitive customer contact information?

Options:

A.

data labels

B.

column-level security

C.

row-level security

D.

Transparent Data Encryption (TDE)

Question 40

You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The solution must meet the customer sentiment analytics requirements.

Which three Transaction-SQL DDL commands should you run in sequence? To answer, move the appropriate commands from the list of commands to the answer area and arrange them in the correct order.

NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.

Question # 40

Options:

Question 41

You need to design the partitions for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 41

Options:

Question 42

You need to implement the surrogate key for the retail store table. The solution must meet the sales transaction

dataset requirements.

What should you create?

Options:

A.

a table that has an IDENTITY property

B.

a system-versioned temporal table

C.

a user-defined SEQUENCE object

D.

a table that has a FOREIGN KEY constraint

Question 43

You need to design a data storage structure for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 43

Options:

Question 44

You need to integrate the on-premises data sources and Azure Synapse Analytics. The solution must meet the data integration requirements.

Which type of integration runtime should you use?

Options:

A.

Azure-SSIS integration runtime

B.

self-hosted integration runtime

C.

Azure integration runtime

Question 45

You need to design an analytical storage solution for the transactional data. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 45

Options:

Question 46

You need to implement versioned changes to the integration pipelines. The solution must meet the data integration requirements.

In which order should you perform the actions? To answer, move all actions from the list of actions to the answer area and arrange them in the correct order.

Question # 46

Options:

Question 47

You need to implement an Azure Synapse Analytics database object for storing the sales transactions data. The solution must meet the sales transaction dataset requirements.

What solution must meet the sales transaction dataset requirements.

What should you do? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 47

Options:

Question 48

You need to design a data retention solution for the Twitter feed data records. The solution must meet the customer sentiment analytics requirements.

Which Azure Storage functionality should you include in the solution?

Options:

A.

change feed

B.

soft delete

C.

time-based retention

D.

lifecycle management

Question 49

You need to design a data ingestion and storage solution for the Twitter feeds. The solution must meet the customer sentiment analytics requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area

NOTE: Each correct selection b worth one point.

Question # 49

Options:

Page: 1 / 35
Total 347 questions