Unlock your Full Databricks-Certified-Professional-Data-Scientist Databricks Stable Exam

Databricks Certified Professional Data Scientist Exam Questions and Answers

Question 1

What are the advantages of the mutual information over the Pearson correlation for text classification problems?

Options:

The mutual information has a meaningful test for statistical significance.

The mutual information can signal non-linear relationships between the dependent and independent variables.

The mutual information is easier to parallelize.

The mutual information doesn't assume that the variables are normally distributed.

Question 2

A problem statement is given as below

Hospital records show that of patients suffering from a certain disease, 75% die of it. What is the probability that of 6 randomly selected patients, 4 will recover?

Which of the following model will you use to solve it.

Options:

Binomial

Poisson

Normal

Any of the above

Question 3

A fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of the

Options:

Presence of the other features.

Absence of the other features.

Presence or absence of the other features

None of the above

Question 4

Select the statement which applies correctly to the Naive Bayes

Options:

Works with a small amount of data

Sensitive to how the input data is prepared

Works with nominal values

Question 5

Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it has

rained only 5 days each year. Unfortunately, the weatherman has predicted rain for tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the time. When it doesn't rain, he incorrectly forecasts rain 10% of the time. Which of the following will you use to calculate the probability whether it will rain on the

day of Marie’s wedding?

Options:

Naive Bayes

Logistic Regression

Random Decision Forests

All of the above

Question 6

You are working on a Data Science project and during the project you have been gibe a responsibility to interview all the stakeholders in the project. In which phase of the project you are?

Options:

Discovery

Data Preparations

Creating Models

Executing Models

Creating visuals from the outcome

Operationnalise the models

Question 7

Which analytical method is considered unsupervised?

Question # 7

may have a trend component that is quadratic in nature. Which pattern of data will indicate that the trend in the time series data is quadratic in nature?

Options:

Naive Bayesian classifier

Decision tree

Linear regression

K-means clustering

Question 8

What is one modeling or descriptive statistical function in MADlib that is typically not provided in a standard relational database?

Options:

Expected value

Variance

Linear regression

Quantiles

Question 9

Which of the following is a correct example of the target variable in regression (supervised learning)?

Options:

Nominal values like true, false

Reptile, fish, mammal, amphibian, plant, fungi

Infinite number of numeric values, such as 0.100, 42.001, 1000.743..

All of the above

Question 10

Select the correct option from the below

Options:

If you're trying to predict or forecast a target value^ then you need to look into supervised learning.

If you've chosen supervised learning, with discrete target value like Yes/No. 1/2/3, A/B/C: or Red/Yellow/Black, then look into classification.

If the target value can take on a number of values, say any value from 0.00 to 100.00, or -999 to 999: or +_to -_, then you need to look unsupervised learning

If you're not trying to predict a target value, then you need to look into unsupervised learning

Are you trying to fit your data into some discrete groups? If so and that's all you need, you should look into clustering.

Question 11

A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.

Above is an example of

Options:

Linear Regression

Logistic Regression

Recommendation system

Maximum likelihood estimation

Hierarchical linear models

Question 12

Of all the smokers in a particular district, 40% prefer brand A and 60% prefer brand B. Of those smokers who prefer brand A. 30% are females, and of those who prefer brand B. 40% are female. What is the probability that a randomly selected smoker prefers brand A, given that the person selected is a female?

Which of the following is a best way to solve this problem?

Options:

Bays Theorem

Poisson Distribution

Binomial Distribution

None of the above

Question 13

In which lifecycle stage are test and training data sets created?

Options:

Model planning

Discovery

Model building

Data preparation

Answer:

Explanation:

In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology time, and data. Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data. Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data Model planning: Phase 3 is model planning, where the team determines the methods, techniques, and workflow it intends to follow for the subsequent model building phase. The team explores the data to learn about the relationships between variables and subsequently selects key variables and the most suitable models.

Model building: In Phase 4, the team develops datasets for testing, training, and production purposes. In addition, in this phase the team builds and executes models based on the work done in the model planning phase. The team also considers whether its existing tools will suffice for running the models, or if it will need a more robust environment for executing models and workflows (for example, fast hardware and parallel processing, if applicable).

Communicate results: In Phase 5, the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1. The team should identify key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders.

Operationalize: In Phase 6, the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment.

Question 14

You are creating a regression model with the input income, education and current debt of a customer, what could be the possible output from this model.

Options:

Customer fit as a good

Customer fit as acceptable or average category

expressed as a percent, that the customer will default on a loan

1 and 3 are correct

2 and 3 are correct

Question 15

Which of the following are point estimation methods?

Options:

MAP

MLE

MMSE

Question 16

What is the probability that the total of two dice will be greater than 8, given that the first die is a 6?

Options:

1/3

2/3

1/6

2/6

Question 17

Assume some output variable "y" is a linear combination of some independent input variables "A" plus some independent noise "e". The way the independent variables are combined is defined by a parameter vector B y=AB+e where X is an m x n matrix. B is a vector of n unknowns, and b is a vector of m values. Assuming that m is not equal to n and the columns of X are linearly independent, which expression correctly solves for B?

Question # 17

Options:

Option A

Option B

Option C

Option D

Question 18

You are asked to create a model to predict the total number of monthly subscribers for a specific magazine. You are provided with 1 year's worth of subscription and payment data, user demographic data, and 10 years worth of content of the magazine (articles and pictures). Which algorithm is the most appropriate for building a predictive model for subscribers?

Options:

Linear regression

Logistic regression

Decision trees

TF-IDF

Question 19

In which of the scenario you can use the regression to predict the values

Options:

Samsung can use it for mobile sales forecast

Mobile companies can use it to forecast manufacturing defects

Probability of the celebrity divorce

Only 1 and 2

All 1 ,2 and 3

Question 20

Select the correct statement which applies to Supervised learning

Options:

We asks the machine to learn from our data when we specify a target variable.

Lesser machine's task to only divining some pattern from the input data to get the target variable

Instead of telling the machine Predict Y for our data X, we're asking What can you tell me about X?

Load More Databricks-Certified-Professional-Data-Scientist Questions

Summer Sale- Special Discount Limited Time 65% Offer - Ends in 0d 00h 00m 00s - Coupon code: netdisc

Activedumpsnet Logo

Activedumpsnet Navigation

Activedumpsnet Slider

Databricks Databricks-Certified-Professional-Data-Scientist Databricks Certified Professional Data Scientist Exam Exam Practice Test

Databricks Certified Professional Data Scientist Exam Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Copyright © 2014-2025 Activedumpsnet. All Rights Reserved