Select the correct option for the following:
Train a decision tree model with default parameters and vary the depth from 1 to 8 (both values included) and compare the model performance at each value of depth
At depth = 1, the decision tree gives the highest recall among all the models on the training set.
At depth = 2, the decision tree gives the highest recall among all the models on the training set.
At depth = 5, the decision tree gives the highest recall among all the models on the training set.
At depth = 8, the decision tree gives the highest recall among all the models on the training set.
Ans: 1
score_DT = []
for i in range(1,9):
dTree = DecisionTreeClassifier(max_depth=i,criterion = 'gini', random_state=1)
dTree.fit(X_train, y_train)
pred = dTree.predict(X_train)
case = {'Depth':i,'Recall':recall_score(y_train,pred)}
score_DT.append(case)
print(score_DT)
[{'Depth': 1, 'Recall': 0.6875}, {'Depth': 2, 'Recall': 0.8888888888888888}, {'Depth': 3, 'Recall': 0.8888888888888888}, {'Depth': 4, 'Recall': 0.9583333333333334}, {'Depth': 5, 'Recall': 0.9652777777777778}, {'Depth': 6, 'Recall': 0.9930555555555556}, {'Depth': 7, 'Recall': 0.9861111111111112}, {'Depth': 8, 'Recall': 1.0}]
Question8:
Plot the feature importance of the variables given by the model which gives the maximum value of recall on the training set in Q7. Which are the 2 most important variables respectively?
- lumbar_lordosis_angle, sacrum_angle
- degree_spondylolisthesis, pelvic tilt
- scoliosis_slope, cervial_tilt
- scoliosis_slope, cervial_tilt
Ans: degree_spondylolisthesis, pelvic tilt
Question9:
Perform hyperparmater tuning for Decision tree using GridSrearchCV.
Use the following list of hyperparameters and their values:
Maximum depth: [5,10,15, None], criterion: ['gini','entropy'], splitter: ['best','random'] Set cv = 3 in grid search Set scoring = 'recall' in grid search Which of the following statements is/are True?
A) GridSeachCV selects the max_depth as 10
B) GridSeachCV selects the criterion as 'gini'
C) GridSeachCV selects the splitter as 'random'
D) GridSeachCV selects the splitter as 'best'
E) GridSeachCV selects the max_depth as 5
F) GridSeachCV selects the criterion as 'entropy'
- A, B, and C
- B, C, and E
- A, C, and F
- D, E, and F
Ans: A, C, and F
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1)
# Grid of parameters to choose from
parameters = {'max_depth': [5,10,15,None],
'criterion' : ['gini','entropy'],
'splitter' : ['best','random']
}
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring='recall',cv=3)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(X_train, y_train)
DecisionTreeClassifier(criterion='entropy', max_depth=10, random_state=1, splitter='random')
Question10:
Compare the model performance of a Decision Tree with default parameters and the tuned Decision tree built in Q9 on the test set.
Which of the following statements is/are True?
- A) Recall Score of tuned model > Recall Score of decision tree with default parameters
- B) Recall Score of tuned model < Recall Score of decision tree with default parameters
- C) F1 Score of tuned model > F1 Score Score of decision tree with default parameters
- D) F1 Score of tuned model < F1 Score of decision tree with default parameters
A and B
B and C
C and D
A and D
Ans: A and D
# Training decision tree with default parameters
model = DecisionTreeClassifier(random_state=1)
model.fit(X_train,y_train)
# Tuned model
estimator.fit(X_train, y_train)
# Checking model performance of Decision Tree with default parameters
print(recall_score(y_test,y_pred_test1))
print(metrics.f1_score(y_test,y_pred_test1))
# Checking model performance of tunedDecision Tree
print(recall_score(y_test,y_pred_test2))
print(metrics.f1_score(y_test,y_pred_test2))