Saturday 16 October 2021

Supervised Learning - Classification/ Quiz - Decision Tree









Q No: 1

What is the final objective of Decision Tree?

  1. Maximise the Gini Index of the leaf nodes
  2. Minimise the homogeneity of the leaf nodes
  3. Maximise the heterogeneity of the leaf nodes
  4. Minimise the impurity of the leaf nodes

Ans: Minimise the impurity of the leaf nodes

In decision tree, after every split we hope to have lesser 'impurity' in the subsequent node. So that, eventually we end up with leaf nodes that have the least 'impurity'/entropy


Q No: 2

Decision Trees can be used to predict

  1. Continuous Target Variables
  2. Categorical Target Variables
  3. Random Variables
  4. Both Continuous and Categorical Target Variables

Ans: Both Continuous and Categorical Target Variables


Q No: 3

When we create a Decision Tree, how is the best split determined at each node?

  1. We split the data using the first independent variable and so on.
  2. The first split is determined randomly and from then on we start choosing the best split.
  3. We make at most 5 splits on the data using only one independent variable and choose the split that gives the highest Gini gain.
  4. We make all possible splits on the data using the independent variables and choose the split that gives the highest Gini gain.

Ans: We make all possible splits on the data using the independent variables and choose the split that gives the highest Gini gain.


Q No: 4

Which of the following is not true about Decision Trees

  1. Decision Trees tend to overfit the test data
  2. Decision Trees can be pruned to reduce overfitting
  3. Decision Trees would grow to maximum possible depth to achieve 100% purity in the leaf nodes, this generally leads to overfitting.
  4. Decision Trees can capture complex patterns in the data.

Ans: Decision Trees tend to overfit the test data


Q No: 5

If we increase the value of the hyperparameter min_samples_leaf from the default value, we would end up getting a ______________ tree than the tree with the default value.

  1. smaller
  2. bigger

Ans: smaller

min_samples_leaf = the minimum number of samples required at a leaf node

As the number of observations required in the leaf node increases, the size of the tree would decrease 


Q No: 6

Which of the following is a perfectly impure node?








  1. Node - 0
  2. Node - 1
  3. Node - 2
  4. None of these

Ans: Node - 1

Gini = 0.5 at Node 1

gini = 0 -> Perfectly Pure

gini = o.5 -> Perfectly Impure


Q No: 7

In a classification setting, if we do not limit the size of the decision tree it will only stop when all the leaves are:

  1. All leaves are at the same depth
  2. of the same size
  3. homogenous
  4. heterogenous

Ans: homogenous

The tree will stop splitting after the impurity in every leaf is zero


Q No: 8

Which of the following explains pre-pruning?

  1. Before pruning a decision tree, we need to create the tree. This process of creating the tree before pruning is known as pre-pruning.
  2. Starting with a full-grown tree and creating trees that are sequentially smaller is known as pre-pruning
  3. We stop the decision tree from growing to its full length by bounding the hyper parameters, this is known as pre-pruning.
  4. Building a decision tree on default hyperparameter values is known as pre-pruning.

Ans: We stop the decision tree from growing to its full length by bounding the hyper parameters, this is known as pre-pruning.


Q No: 9

Which of the following is the same across Classification and Regression Decision Trees?

  1. Type of predicted variable
  2. Impurity Measure/ Splitting Criteria
  3. max_depth parameter

Ans: max_depth parameter


Q No: 10

Select the correct order in which a decision tree is built:

  1. Calculate the Gini impurity after each split
  2. Decide the best split based on the lowest Gini impurity
  3. Repeat the complete process until the stopping criterion is reached or the tree has achieved homogeneity in leaves.
  4. Select an attribute of data and make all possible splits in data
  5. Repeat the steps for every attribute present in the data

  • 4,1,3,2,5
  • 4,1,5,2,3
  • 4,1,3,2,5
  • 4,1,5,3,2

Ans: 4,1,5,2,3

0 comments: