Unlock your Full Databricks-Machine-Learning-Professional Databricks Stable Exam

Databricks Certified Machine Learning Professional Questions and Answers

Question 1

A machine learning engineer is converting a Hyperopt-based hyperparameter tuning process from manual MLflow logging to MLflow Autologging. They are trying to determine how to manage nested Hyperopt runs with MLflow Autologging.

Which of the following approaches will create a single parent run for the process and a child run for each unique combination of hyperparameter values when using Hyperopt and MLflow Autologging?

Options:

Startinq amanual parent run before callingfmin

Ensuring that a built-in model flavor is used for the model logging

Starting a manual child run within the objective function

There is no way to accomplish nested runs with MLflow Autoloqqinq and Hyperopt

MLflow Autoloqqinq will automatically accomplish this task with Hyperopt

Question 2

A machine learning engineer has developed a model and registered it using the FeatureStoreClient fs. The model has model URI model_uri. The engineer now needs to perform batch inference on customer-level Spark DataFrame spark_df, but it is missing a few of the static features that were used when training the model. The customer_id column is the primary key of spark_df and the training set used when training and logging the model.

Which of the following code blocks can be used to compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id?

Options:

df = fs.get_missing_features(spark_df, model_uri)

fs.score_model(model_uri, df)

fs.score_model(model_uri, spark_df)

df = fs.get_missing_features(spark_df, model_uri)

fs.score_batch(model_uri, df)

df = fs.get_missing_features(spark_df)

fs.score_batch(model_uri, df)

fs.score_batch(model_uri, spark_df)

Answer:

Explanation:

To compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id, you can use the following code block:

Python

# Get the missing features from the Feature Store using the model URI and the customer_id column

df = fs.get_missing_features(spark_df, model_uri, lookup_key="customer_id")

# Score the DataFrame using the model URI and the Feature Store Client

fs.score_batch(model_uri, df)

AI-generated code. Review and use carefully. More info on FAQ.

The fs.get_missing_features method takes a Spark DataFrame, a model URI, and a lookup key as arguments. It returns a new Spark DataFrame that contains the originalcolumns plus the missing features that are required by the model. The missing features are retrieved from the Feature Store by joining the DataFrame with the feature tables using the lookup key. The lookup key must match the primary key of the feature tables. The model URI must point to a registered model that was trained using features from the Feature Store1.

The fs.score_batch method takes a model URI and a Spark DataFrame as arguments. It applies the model to the DataFrame and returns a new Spark DataFrame that contains the original columns plus a prediction column. The model URI must point to a registered model that was trained using features from the Feature Store2.

The other options are incorrect because:

Option A: fs.score_model is not a valid method name, as it is missing an underscore. The correct method name is fs.score_batch2.
Option B: fs.score_model without getting the missing features will not work, as the model expects the DataFrame to have all the features that were used for training. The correct way is to use fs.get_missing_features before fs.score_batch12.
Option D: fs.score_batch without getting the missing features will not work, as the model expects the DataFrame to have all the features that were used for training. The correct way is to use fs.get_missing_features before fs.score_batch12.
Option E: fs.score_batch without specifying the lookup key will not work, as the fs.get_missing_features method requires a lookup key to join the DataFrame with the feature tables. The correct way is to use fs.get_missing_features with the lookup key “customer_id” before fs.score_batch12. References: Get missing features, Score batch

Question 3

A data scientist is utilizing MLflow to track their machine learning experiments. After completing a series of runs for the experiment with experiment ID exp_id, the data scientist wants to programmatically work with the experiment run data in a Spark DataFrame. They have an active MLflow Client client and an active Spark session spark.

Which of the following lines of code can be used to obtain run-level results for exp_id in a Spark DataFrame?

Options:

client.list_run_infos(exp_id)

spark.read.format("delta").load(exp_id)

There is no way to programmatically return row-level results from an MLflow Experiment.

mlflow.search_runs(exp_id)

spark.read.format("mlflow-experiment").load(exp_id)

Question 4

A data scientist has computed updated feature values for all primary key values stored in the Feature Store table features. In addition, feature values for some new primary key values have also been computed. The updated feature values are stored in the DataFrame features_df. They want to replace all data in features with the newly computed data.

Which of the following code blocks can they use to perform this task using the Feature Store Client fs?

Question # 4

Options:

Option A

Option B

Option C

Option D

Option E

Question 5

A machine learning engineer is in the process of implementing a concept drift monitoring solution. They are planning to use the following steps:

1. Deploy a model to production and compute predicted values

2. Obtain the observed (actual) label values

3. _____

4. Run a statistical test to determine if there are changes over time

Which of the following should be completed as Step #3?

Options:

Obtain the observed values (actual) feature values

Measure the latency of the prediction time

Retrain the model

None of these should be completed as Step #3

Compute the evaluation metric using the observed and predicted values

Question 6

A machine learning engineer wants to programmatically create a new Databricks Job whose schedule depends on the result of some automated tests in a machine learning pipeline.

Which of the following Databricks tools can be used to programmatically create the Job?

Options:

MLflow APIs

AutoML APIs

MLflow Client

Jobs cannot be created programmatically

Databricks REST APIs

Question 7

Which of the following statements describes streaming with Spark as a model deployment strategy?

Options:

The inference of batch processed records as soon as a trigger is hit

The inference of all types of records in real-time

The inference of batch processed records as soon as a Spark job is run

The inference of incrementally processed records as soon as trigger is hit

The inference of incrementally processed records as soon as a Spark job is run

Answer:

Explanation:

Streaming with Spark as a model deployment strategy means applying a machine learning model to data streams that are processed incrementally and continuously by Spark Structured Streaming. Spark Structured Streaming is a scalable and fault-tolerant stream processing engine that enables complex analytics on live data streams using the Dataset/DataFrame API1. Spark Structured Streaming supports various sources and sinks for streaming data, such as Kafka, Kinesis, TCP sockets, Delta tables, etc2. Spark Structured Streaming also supports various types of operations on streaming data, such as aggregations, windowing, joins, and stateful transformations3. To deploy a machine learning model on streaming data, you can use the MLflow model registry to managethe model lifecycle and versioning4. You can also use the MLflow model serving feature to serve the model as a REST API endpoint that can be invoked by Spark Structured Streaming5. Alternatively, you can use the UDF (user-defined function) feature to apply the model to streaming data within Spark Structured Streaming6.

The inference of incrementally processed records as soon as trigger is hit describes the streaming with Spark as a model deployment strategy. A trigger defines when the results of a streaming query should be written to the output sink. A trigger can be based on a processing time interval, a data size limit, or a continuous mode that writes the results as soon as they are available. The trigger ensures that the streaming query is executed incrementally and continuously, and the model inference is applied to the latest available data. The other options are incorrect because:

Option A: The inference of batch processed records as soon as a trigger is hit does not describe streaming with Spark, but rather batch processing with Spark. Batch processing means applying a machine learning model to a finite set of data that is processed as a single job. Batch processing does not require a trigger, as the results are written to the output sink when the job is completed.
Option B: The inference of all types of records in real-time does not describe streaming with Spark, but rather a generic definition of real-time processing. Real-time processing means applying a machine learning model to data streams that are processed as soon as they arrive, with minimal latency. Real-time processing does not necessarily use Spark Structured Streaming, as there are other frameworks and tools that can support it, such as Apache Flink, Apache Storm, etc.
Option C: The inference of batch processed records as soon as a Spark job is run does not describe streaming with Spark, but rather batch processing with Spark. Batch processing means applying a machine learning model to a finite set of data that is processed as a single job. Batch processing does not depend on a Spark job, as the model inference can be done outside of Spark, such as using a REST API endpoint, a command-line tool, etc.
Option E: The inference of incrementally processed records as soon as a Spark job is run does not describe streaming with Spark, but rather a contradiction. Incrementally processed records imply streaming processing, while a Spark job implies batch processing. Streaming processing and batch processing are different paradigms of data processing, and cannot be mixed in this way. References: Structured Streaming Programming Guide, Input Sources and Output Sinks, Operations on streaming DataFrames/Datasets, MLflow Model Registry, MLflow Model Serving, Apply machine learning models, [Triggers], [Trigger Types], [Batch Processing], [Real-time Processing], [Real-time Data Processing Frameworks], [Deploy machine learning models], [Batch vs Streaming Processing]

Question 8

A data scientist has developed and logged a scikit-learn random forest model model, and then they ended their Spark session and terminated their cluster. After starting a new cluster, they want to review the feature_importances_ of the original model object.

Which of the following lines of code can be used to restore the model object so that feature_importances_ is available?

Options:

mlflow.load_model(model_uri)

client.list_artifacts(run_id)["feature-importances.csv"]

mlflow.sklearn.load_model(model_uri)

This can only be viewed in the MLflow Experiments UI

client.pyfunc.load_model(model_uri)

Question 9

A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.

Which of the following tools can be used to provide this type of continuous processing?

Options:

Spark UDFs

[Structured Streaming

MLflow

D Delta Lake

AutoML

Question 10

A machine learning engineer has registered a sklearn model in the MLflow Model Registry using the sklearn model flavor with UI model_uri.

Which of the following operations can be used to load the model as an sklearn object for batch deployment?

Options:

mlflow.spark.load_model(model_uri)

mlflow.pyfunc.read_model(model_uri)

mlflow.sklearn.read_model(model_uri)

mlflow.pyfunc.load_model(model_uri)

mlflow.sklearn.load_model(model_uri)

Question 11

Which of the following is a reason for using Jensen-Shannon (JS) distance over a Kolmogorov-Smirnov (KS) test for numeric feature drift detection?

Options:

All of these reasons

JS is not normalized or smoothed

None of these reasons

JS is more robust when working with large datasets

JS does not require any manual threshold or cutoff determinations

Question 12

Which of the following lists all of the model stages are available in the MLflow Model Registry?

Options:

Development. Staging. Production

None. Staging. Production

Staging. Production. Archived

None. Staging. Production. Archived

Development. Staging. Production. Archived

Question 13

Which of the following describes the concept of MLflow Model flavors?

Options:

A convention that deployment tools can use to wrap preprocessing logic into a Model

A convention that MLflow Model Registry can use to version models

A convention that MLflow Experiments can use to organize their Runs by project

A convention that deployment tools can use to understand the model

A convention that MLflow Model Registrycan use to organize its Models by project

Question 14

A machine learning engineer has created a webhook with the following code block:

Question # 14

Which of the following code blocks will trigger this webhook to run the associate job?

Question # 14

Options:

Option A

Option B

Option C

Option D

Option E

Question 15

Which of the following is a benefit of logging a model signature with an MLflow model?

Options:

The model will have a unique identifier in the MLflow experiment

The schema of input data can be validated when serving models

The model can be deployed using real-time serving tools

The model will be secured by the user that developed it

The schema of input data will be converted to match the signature

Question 16

Which of the following describes the purpose of the context parameter in the predict method of Python models for MLflow?

Options:

The context parameter allows the user to specify which version of the registered MLflowModel should be used based on the given application's current scenario

The context parameter allows the user to document the performance of a model after it has been deployed

The context parameter allows the user to include relevant details of the business case to allow downstream users to understand the purpose of the model

The context parameter allows the user to provide the model with completely custom if-else logic for the given application's current scenario

The context parameter allows the user to provide the model access to objects like preprocessing models or custom configuration files

Question 17

Which of the following is a simple, low-cost method of monitoring numeric feature drift?

Options:

Jensen-Shannon test

Summary statistics trends

Chi-squared test

None of these can be used to monitor feature drift

Kolmogorov-Smirnov (KS) test

Question 18

A machine learning engineer wants to view all of the active MLflow Model Registry Webhooks for a specific model.

They are using the following code block:

Question # 18

Which of the following changes does the machine learning engineer need to make to this code block so it will successfully accomplish the task?

Options:

There are no necessary changes

Replace list with view in the endpoint URL

Replace POST with GET in the call to http request

Replace list with webhooks in the endpoint URL

Replace POST with PUT in the call to http request

Load More Databricks-Machine-Learning-Professional Questions

Big Halloween Sale Limited Time Flat 70% Discount offer - Ends in 0d 00h 00m 00s - Coupon code: 70spcl

Activedumpsnet Logo

Activedumpsnet Navigation

Activedumpsnet Slider

Databricks Databricks-Machine-Learning-Professional Databricks Certified Machine Learning Professional Exam Practice Test

Databricks Certified Machine Learning Professional Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Copyright © 2014-2025 Activedumpsnet. All Rights Reserved