We had migrated all our employees to exchange online and decommissioned our on-prem exchange servers which was hosting user databases. Retained on-prem exchange server 2013 CU22 just for SMTP application relay purpose. Also have created a new on-prem database with 2 user mailbox. However, with that user we were not able to login to the ECP on-prem.
No issues in login to the online Exchange Admin Centre.
What errors do you see?
:-( Something went wrong We can't get that information right now. Please try again later
What's the environment and are there recent changes?
Exchange server 2013 Cu22 on Windows 2012R2.
Our emails (...domain) have been migrated to exchange online.
We are using exchange on-prem for the application servers hosted on Azure to relay to onprem-exchange. Noticed that on-prem ECP wasn't accessible. Single database and two mailbox accounts on on-prem exchange.
What have you tried to troubleshoot this?
Verified on-prem database and 2 users accounts already available on on-prem exchange. but cannot login via https://localhost/ecp
Resolution:
You cannot access ECP, it is getting redirected to office 365 while accessing local host/ecp.
We checked the HTTP redirect on Default Frontend, there are no settings found.
We checked the HTTP redirect on ECP, we found no settings
We found HTTP redirect configured for OWA, redirected to office 365 portal
We unchecked the settings and were able to access ECP successfully.
As per Microsoft baseline recommendation, good practice to fine tune password lockout threshold from 15 invalid logon attempts to 10 invalid logon attempts
There is a built-in tool called
“Resultant Set of Policy” (RSoP) that simulates the policy settings applied to
computers and users using Group Policy. It acts as a query engine that polls
existing policies based on site, domain, domain controller, and organizational
unit, and then reports the results of those queries.
To launch Resultant Set of Policy,
press Win + R to fire up the Run dialog box,
type rsop.msc, and press Enter.
The tool fires up and scans the
active policies and displays them within the tool. You will still need to go
through the folders to find out each active policy applied to the account and
computer.
GPResult
Alternatively, there is also a
command line called GPResult that you can also use to collect active
Group Policy settings. Simply open a Command Prompt and run the following
command.
gpresult /scope
user /v
This is to search and show all the
active policies applied to the current user. To find all policies applied to
the PC, run the following instead in an elevated Command Prompt window.
gpresult /scope
computer /v
Even more, you can use GPResult to
gather Group Policy information applied to certain user account from a remote
computer, such as below:
In decision tree, after every split we hope to have lesser 'impurity' in the subsequent node. So that, eventually we end up with leaf nodes that have the least 'impurity'/entropy
Q No: 2
Decision Trees can be used to predict
Continuous Target Variables
Categorical Target Variables
Random Variables
Both Continuous and Categorical Target Variables
Ans: Both Continuous and Categorical Target Variables
Q No: 3
When we create a Decision Tree, how is the best split determined at each node?
We split the data using the first independent variable and so on.
The first split is determined randomly and from then on we start choosing the best split.
We make at most 5 splits on the data using only one independent variable and choose the split that gives the highest Gini gain.
We make all possible splits on the data using the independent variables and choose the split that gives the highest Gini gain.
Ans: We make all possible splits on the data using the independent variables and choose the split that gives the highest Gini gain.
Q No: 4
Which of the following is not true about Decision Trees
Decision Trees tend to overfit the test data
Decision Trees can be pruned to reduce overfitting
Decision Trees would grow to maximum possible depth to achieve 100% purity in the leaf nodes, this generally leads to overfitting.
Decision Trees can capture complex patterns in the data.
Ans: Decision Trees tend to overfit the test data
Q No: 5
If we increase the value of the hyperparameter min_samples_leaf from the default value, we would end up getting a ______________ tree than the tree with the default value.
smaller
bigger
Ans: smaller
min_samples_leaf = the minimum number of samples required at a leaf node
As the number of observations required in the leaf node increases, the size of the tree would decrease
Q No: 6
Which of the following is a perfectly impure node?
Node - 0
Node - 1
Node - 2
None of these
Ans: Node - 1
Gini = 0.5 at Node 1
gini = 0 -> Perfectly Pure
gini = o.5 -> Perfectly Impure
Q No: 7
In a classification setting, if we do not limit the size of the decision tree it will only stop when all the leaves are:
All leaves are at the same depth
of the same size
homogenous
heterogenous
Ans: homogenous
The tree will stop splitting after the impurity in every leaf is zero
Q No: 8
Which of the following explains pre-pruning?
Before pruning a decision tree, we need to create the tree. This process of creating the tree before pruning is known as pre-pruning.
Starting with a full-grown tree and creating trees that are sequentially smaller is known as pre-pruning
We stop the decision tree from growing to its full length by bounding the hyper parameters, this is known as pre-pruning.
Building a decision tree on default hyperparameter values is known as pre-pruning.
Ans: We stop the decision tree from growing to its full length by bounding the hyper parameters, this is known as pre-pruning.
Q No: 9
Which of the following is the same across Classification and Regression Decision Trees?
Type of predicted variable
Impurity Measure/ Splitting Criteria
max_depth parameter
Ans: max_depth parameter
Q No: 10
Select the correct order in which a decision tree is built:
Calculate the Gini impurity after each split
Decide the best split based on the lowest Gini impurity
Repeat the complete process until the stopping criterion is reached or the tree has achieved homogeneity in leaves.
Select an attribute of data and make all possible splits in data
Repeat the steps for every attribute present in the data
Lower back pain, also called lumbago, is not a disorder. It’s a symptom of several different types of medical problems. It usually results from a problem with one or more parts of the lower back, such as:
ligaments
muscles
nerves
the bony structures that make up the spine, called vertebral bodies or vertebrae
It can also be due to a problem with nearby organs, such as the kidneys.
According to the American Association of Neurological Surgeons, 75 to 85 percent of Americans will experience back pain in their lifetime. Of those, 50 percent will have more than one episode within a year. In 90 percent of all cases, the pain gets better without surgery. Talk to your doctor if you’re experiencing back pain.
In this Exploratory Data Analysis (EDA) I am going to use the Lower Back Pain Symptoms Dataset and try to find out ineresting insights of this dataset.
# we use tukey method to remove outliers. # whiskers are set at 1.5 times Interquartile Range (IQR)def remove_outlier(feature): first_q = np.percentile(X[feature], 25) third_q = np.percentile(X[feature], 75) IQR = third_q - first_q IQR *= 1.5 minimum = first_q - IQR # the acceptable minimum value maximum = third_q + IQR # the acceptable maximum value
mean = X[feature].mean() """ # any value beyond the acceptance range are considered as outliers. # we replace the outliers with the mean value of that feature. """ X.loc[X[feature] < minimum, feature] = mean X.loc[X[feature] > maximum, feature] = mean # taking all the columns except the last one # last column is the labelX = dataset.iloc[:, :-1]for i in range(len(X.columns)): remove_outlier(X.columns[i])
After removing Outliers:
features distribution after removing outliers
Feature Scaling:
Feature scaling though standardization (or Z-score normalization) can be an important preprocessing step for many machine learning algorithms. Our dataset contains features that vary highly in magnitudes, units and range. But since most of the machine learning algorithms use Euclidean distance between two data points in their computations, this will create a problem. To avoid this effect, we need to bring all features to the same level of magnitudes. This can be achieved
Certain algorithms like XGBoost can only have numerical values as their predictor variables. Hence we need to encode our categorical values. LabelEncoder from sklearn.preprocessing package encodes labels with values between 0 and n_classes-1.
label = dataset["class"]
encoder = LabelEncoder()
label = encoder.fit_transform(label)
Model Training and Evaluation:
X = scaled_df
y = label
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=0)
clf_gnb = GaussianNB()
pred_gnb = clf_gnb.fit(X_train, y_train).predict(X_test)
accuracy_score(pred_gnb, y_test)
# Out []: 0.8085106382978723
clf_svc = SVC(kernel="linear")
pred_svc = clf_svc.fit(X_train, y_train).predict(X_test)
accuracy_score(pred_svc, y_test)
# Out []: 0.7872340425531915
clf_xgb = XGBClassifier()
pred_xgb = clf_xgb.fit(X_train, y_train).predict(X_test)
accuracy_score(pred_xgb, y_test)
# Out []: 0.8297872340425532
Which metric is the most appropriate metric to evaluate the model according to the problem statement?
Accuracy, Recall, Precision, F1 score
Ans: Recall
Predicting a person doesn't have an abnormal spine and a person has an abnormal spine - A person who needs treatment will be missed. Hence, reducing such false negatives is important
Question4:
Check for multicollinearity in data and choose the variables which show high multicollinearity? (VIF value greater than 5)
vif_series = pd.Series([variance_inflation_factor(num_feature_set.values,i) for i in range(num_feature_set.shape[1])],index=num_feature_set.columns, dtype = float)
print('Series before feature selection: \n\n{}\n'.format(vif_series))
Question5:
How many minimum numbers of attributes will we need to drop to remove multicollinearity (or get a VIF value less than 5) from the data?
vif_series1 = pd.Series([variance_inflation_factor(num_feature_set1.values,i) for i in range(num_feature_set1.shape[1])],index=num_feature_set1.columns, dtype = float)
print('Series before feature selection: \n\n{}\n'.format(vif_series1))
vif_series2 = pd.Series([variance_inflation_factor(num_feature_set2.values,i) for i in range(num_feature_set2.shape[1])],index=num_feature_set2.columns, dtype = float)
print('Series before feature selection: \n\n{}\n'.format(vif_series2))
Question6:
Drop sacral_slope attribute and proceed to build a logistic regression model. Drop all the insignificant variables and keep only significant variables (p-value < 0.05).
How many significant variables are left in the final model excluding the constant?
Train a decision tree model with default parameters and vary the depth from 1 to 8 (both values included) and compare the model performance at each value of depth
At depth = 1, the decision tree gives the highest recall among all the models on the training set.
At depth = 2, the decision tree gives the highest recall among all the models on the training set.
At depth = 5, the decision tree gives the highest recall among all the models on the training set.
At depth = 8, the decision tree gives the highest recall among all the models on the training set.
Ans: 1
score_DT = []
for i in range(1,9):
dTree = DecisionTreeClassifier(max_depth=i,criterion = 'gini', random_state=1)
dTree.fit(X_train, y_train)
pred = dTree.predict(X_train)
case = {'Depth':i,'Recall':recall_score(y_train,pred)}
score_DT.append(case)
Plot the feature importance of the variables given by the model which gives the maximum value of recall on the training set in Q7. Which are the 2 most important variables respectively?
lumbar_lordosis_angle, sacrum_angle
degree_spondylolisthesis, pelvic tilt
scoliosis_slope, cervial_tilt
scoliosis_slope, cervial_tilt
Ans: degree_spondylolisthesis, pelvic tilt
Question9:
Perform hyperparmater tuning for Decision tree using GridSrearchCV.
Use the following list of hyperparameters and their values:
Maximum depth: [5,10,15, None], criterion: ['gini','entropy'], splitter: ['best','random'] Set cv = 3 in grid search Set scoring = 'recall' in grid search Which of the following statements is/are True?
A) GridSeachCV selects the max_depth as 10
B) GridSeachCV selects the criterion as 'gini'
C) GridSeachCV selects the splitter as 'random'
D) GridSeachCV selects the splitter as 'best'
E) GridSeachCV selects the max_depth as 5
F) GridSeachCV selects the criterion as 'entropy'
A, B, and C
B, C, and E
A, C, and F
D, E, and F
Ans: A, C, and F
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1)
# Grid of parameters to choose from
parameters = {'max_depth': [5,10,15,None],
'criterion' : ['gini','entropy'],
'splitter' : ['best','random']
}
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring='recall',cv=3)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(X_train, y_train)
Compare the model performance of a Decision Tree with default parameters and the tuned Decision tree built in Q9 on the test set.
Which of the following statements is/are True?
A) Recall Score of tuned model > Recall Score of decision tree with default parameters
B) Recall Score of tuned model < Recall Score of decision tree with default parameters
C) F1 Score of tuned model > F1 Score Score of decision tree with default parameters
D) F1 Score of tuned model < F1 Score of decision tree with default parameters
A and B
B and C
C and D
A and D
Ans: A and D
# Training decision tree with default parameters
model = DecisionTreeClassifier(random_state=1)
model.fit(X_train,y_train)
# Tuned model
estimator.fit(X_train, y_train)
# Checking model performance of Decision Tree with default parameters
print(recall_score(y_test,y_pred_test1))
print(metrics.f1_score(y_test,y_pred_test1))
# Checking model performance of tunedDecision Tree
print(recall_score(y_test,y_pred_test2))
print(metrics.f1_score(y_test,y_pred_test2))