Lower back pain, also called lumbago, is not a disorder. It’s a symptom of several different types of medical problems. It usually results from a problem with one or more parts of the lower back, such as:
ligaments
muscles
nerves
the bony structures that make up the spine, called vertebral bodies or vertebrae
It can also be due to a problem with nearby organs, such as the kidneys.
According to the American Association of Neurological Surgeons, 75 to 85 percent of Americans will experience back pain in their lifetime. Of those, 50 percent will have more than one episode within a year. In 90 percent of all cases, the pain gets better without surgery. Talk to your doctor if you’re experiencing back pain.
In this Exploratory Data Analysis (EDA) I am going to use the Lower Back Pain Symptoms Dataset and try to find out ineresting insights of this dataset.
# we use tukey method to remove outliers. # whiskers are set at 1.5 times Interquartile Range (IQR)def remove_outlier(feature): first_q = np.percentile(X[feature], 25) third_q = np.percentile(X[feature], 75) IQR = third_q - first_q IQR *= 1.5 minimum = first_q - IQR # the acceptable minimum value maximum = third_q + IQR # the acceptable maximum value
mean = X[feature].mean() """ # any value beyond the acceptance range are considered as outliers. # we replace the outliers with the mean value of that feature. """ X.loc[X[feature] < minimum, feature] = mean X.loc[X[feature] > maximum, feature] = mean # taking all the columns except the last one # last column is the labelX = dataset.iloc[:, :-1]for i in range(len(X.columns)): remove_outlier(X.columns[i])
After removing Outliers:
features distribution after removing outliers
Feature Scaling:
Feature scaling though standardization (or Z-score normalization) can be an important preprocessing step for many machine learning algorithms. Our dataset contains features that vary highly in magnitudes, units and range. But since most of the machine learning algorithms use Euclidean distance between two data points in their computations, this will create a problem. To avoid this effect, we need to bring all features to the same level of magnitudes. This can be achieved
Certain algorithms like XGBoost can only have numerical values as their predictor variables. Hence we need to encode our categorical values. LabelEncoder from sklearn.preprocessing package encodes labels with values between 0 and n_classes-1.
label = dataset["class"]
encoder = LabelEncoder()
label = encoder.fit_transform(label)
Model Training and Evaluation:
X = scaled_df
y = label
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=0)
clf_gnb = GaussianNB()
pred_gnb = clf_gnb.fit(X_train, y_train).predict(X_test)
accuracy_score(pred_gnb, y_test)
# Out []: 0.8085106382978723
clf_svc = SVC(kernel="linear")
pred_svc = clf_svc.fit(X_train, y_train).predict(X_test)
accuracy_score(pred_svc, y_test)
# Out []: 0.7872340425531915
clf_xgb = XGBClassifier()
pred_xgb = clf_xgb.fit(X_train, y_train).predict(X_test)
accuracy_score(pred_xgb, y_test)
# Out []: 0.8297872340425532
Which metric is the most appropriate metric to evaluate the model according to the problem statement?
Accuracy, Recall, Precision, F1 score
Ans: Recall
Predicting a person doesn't have an abnormal spine and a person has an abnormal spine - A person who needs treatment will be missed. Hence, reducing such false negatives is important
Question4:
Check for multicollinearity in data and choose the variables which show high multicollinearity? (VIF value greater than 5)
vif_series = pd.Series([variance_inflation_factor(num_feature_set.values,i) for i in range(num_feature_set.shape[1])],index=num_feature_set.columns, dtype = float)
print('Series before feature selection: \n\n{}\n'.format(vif_series))
Question5:
How many minimum numbers of attributes will we need to drop to remove multicollinearity (or get a VIF value less than 5) from the data?
vif_series1 = pd.Series([variance_inflation_factor(num_feature_set1.values,i) for i in range(num_feature_set1.shape[1])],index=num_feature_set1.columns, dtype = float)
print('Series before feature selection: \n\n{}\n'.format(vif_series1))
vif_series2 = pd.Series([variance_inflation_factor(num_feature_set2.values,i) for i in range(num_feature_set2.shape[1])],index=num_feature_set2.columns, dtype = float)
print('Series before feature selection: \n\n{}\n'.format(vif_series2))
Question6:
Drop sacral_slope attribute and proceed to build a logistic regression model. Drop all the insignificant variables and keep only significant variables (p-value < 0.05).
How many significant variables are left in the final model excluding the constant?
Train a decision tree model with default parameters and vary the depth from 1 to 8 (both values included) and compare the model performance at each value of depth
At depth = 1, the decision tree gives the highest recall among all the models on the training set.
At depth = 2, the decision tree gives the highest recall among all the models on the training set.
At depth = 5, the decision tree gives the highest recall among all the models on the training set.
At depth = 8, the decision tree gives the highest recall among all the models on the training set.
Ans: 1
score_DT = []
for i in range(1,9):
dTree = DecisionTreeClassifier(max_depth=i,criterion = 'gini', random_state=1)
dTree.fit(X_train, y_train)
pred = dTree.predict(X_train)
case = {'Depth':i,'Recall':recall_score(y_train,pred)}
score_DT.append(case)
Plot the feature importance of the variables given by the model which gives the maximum value of recall on the training set in Q7. Which are the 2 most important variables respectively?
lumbar_lordosis_angle, sacrum_angle
degree_spondylolisthesis, pelvic tilt
scoliosis_slope, cervial_tilt
scoliosis_slope, cervial_tilt
Ans: degree_spondylolisthesis, pelvic tilt
Question9:
Perform hyperparmater tuning for Decision tree using GridSrearchCV.
Use the following list of hyperparameters and their values:
Maximum depth: [5,10,15, None], criterion: ['gini','entropy'], splitter: ['best','random'] Set cv = 3 in grid search Set scoring = 'recall' in grid search Which of the following statements is/are True?
A) GridSeachCV selects the max_depth as 10
B) GridSeachCV selects the criterion as 'gini'
C) GridSeachCV selects the splitter as 'random'
D) GridSeachCV selects the splitter as 'best'
E) GridSeachCV selects the max_depth as 5
F) GridSeachCV selects the criterion as 'entropy'
A, B, and C
B, C, and E
A, C, and F
D, E, and F
Ans: A, C, and F
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1)
# Grid of parameters to choose from
parameters = {'max_depth': [5,10,15,None],
'criterion' : ['gini','entropy'],
'splitter' : ['best','random']
}
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring='recall',cv=3)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(X_train, y_train)
Compare the model performance of a Decision Tree with default parameters and the tuned Decision tree built in Q9 on the test set.
Which of the following statements is/are True?
A) Recall Score of tuned model > Recall Score of decision tree with default parameters
B) Recall Score of tuned model < Recall Score of decision tree with default parameters
C) F1 Score of tuned model > F1 Score Score of decision tree with default parameters
D) F1 Score of tuned model < F1 Score of decision tree with default parameters
A and B
B and C
C and D
A and D
Ans: A and D
# Training decision tree with default parameters
model = DecisionTreeClassifier(random_state=1)
model.fit(X_train,y_train)
# Tuned model
estimator.fit(X_train, y_train)
# Checking model performance of Decision Tree with default parameters
print(recall_score(y_test,y_pred_test1))
print(metrics.f1_score(y_test,y_pred_test1))
# Checking model performance of tunedDecision Tree
print(recall_score(y_test,y_pred_test2))
print(metrics.f1_score(y_test,y_pred_test2))
Before I discuss
solutions to help you get more organized, let’s look at some examples of
horrible cable management. Be warned: some of these examples may just make you
cry;
Can you find the hidden equipment in this mess?
One of the leading
Data Centre I visited had this bad cable management and we had to wait for
another two weeks to decommission riverbed wan accelerator appliance! Guess
what. To pull out the customer appliance they obviously had to plan for a
production downtime.
If you dread walking
into your server room to troubleshoot a network issue because of bad cable
management or worse, dread having to give higher-ups a tour of your facilities,
then it’s about time to straighten up your cable management system.
Some internet glimpses for
some of the worst cable hell/ wiring ever seen.
Here are
some things you can do now to avoid joining the terrible cable management hall
of fame photos I just highlighted above.
Proper
cable management will not only support existing infrastructure, but will also
allow to accommodate future growth.
Consider
these tips for your next project:
Before purchasing or installing cable
products, determine the amount of cabling and connections required. Be
sure to allow room for access and growth.
Be sure to follow industry standards, such as
ANSI/TIA and ISO/IEC, as well as any federal, state or local regulations.
This will help ensure a safe, failure-free installation that will minimize
system downtime.
Plan for change by organizing cable properly
and labeling cable that may need to be quickly and easily identified.
Also, try to avoid blocking access to equipment inside and outside the
racks.
Be sure to use sweeping 90-degree bends when
transitioning from the pathway support to the racks.
Density is very important in data center
cabinets and racks, so keep in mind how many rack spaces are being
utilized with horizontal wire managers.
Select a vertical cable manager that can
accommodate all of the cable feeding from the horizontal managers. Use
waterfalls and spools to help manage multiple cables and to help with
maintaining proper bend radius on copper and fiber cables.
Using a 50% cable fill when selecting vertical
and horizontal cable management. This allows sufficient space for
maintaining cable bend radius for patch cords.
Efficiency
Making
our installations more efficient is one of the most beneficial tasks a person
should consider. Not only does it save time but can decrease issues down the
line. This is the plus side of proper cable management. Cable management is the
organization of electrical or optical cables in a cabinet or an installation.
The term comes from the goal of planning. Cable installations vary from job to
job but for the most part you can see how difficult it is to properly situate
each cable to make it easy to work with. Problems can happen down the line with
too many cables around each other with possible issues of unplugging or
identifying which cable is the cause. This is why cable management is very
crucial to a smooth work place and installation.
Safety
Proper
cable management can increase safety measures in the work place. Fire is a
cause for concern after cable installation and loose cable can become tangled
with each other possibly creating a spark. This spark can then turn into a fire
damaging your network, data center and building and ofcoure financial loss!
There is also the chance of someone coming by where the cables are installed
and tripping or catching on the cables resulting in an injury. You never know
what might happen and it's best to keep a clean and organized setup
Air Flow
An important
aspect to cables longevity is the abundance of air flow during installation.
The more air flow the better is the goal when cable is connected/running. This
increases energy efficiency as well. Keeping temperatures low and consistent is
beneficial to cables structure and performance. Increased temperatures can
damage the cables jacket and do harm to its inner workings. Keeping your cables
tied together and out of the way will open up airways to get to the cables to
prevent temperatures from possibly increasing surrounding the cables.
Diagnosis
Correct
cable management can make life easier when going back to troubleshoot the
problem with your cable. Organizing
your network with various colors can help you trouble shoot problems down the
line and can help in managing future additions. Plus, you'll get major props
from others for a well managed setup.
If you have accidentally deleted your OneDrive files, then no need to worry. You can recover it from your OneDrive's recycle bin. You might also receive a warning email from SharePoint Online (no-reply@sharepointonline.com) Microsoft support team similar to the one listed below
Files are permanently removed from the
online recycle bin 93 days after they're deleted
Hi Rinith KT,
We noticed that you recently deleted a large number of files from your
OneDrive.
When files are deleted, they're stored in your recycle bin and can be
restored within 93 days. After 93 days, deleted files are gone forever.
If you want to restore these files, go to the recycle
bin. Select what you want to restore, and click the
Restore button.
Ignore this mail if you meant to get rid of these files.