Feature Selection For Dimensionality Reduction(Wrapper Method)

Published in

Analytics Vidhya

4 min readJun 28, 2020

In machine learning selecting important features in the data is an important part of the full cycle.

Passing data with irrelevant features might affect the performance of the model because model learns the irrelevant features passed in it.

Need of Feature Selection:

It helps simplify models to make them easier and faster to train.
Reduces training times.
Helps avoid the curse of dimensionality,
Enhanced generalization by reducing overfitting (formally, reduction of variance)

Methods for Feature Selection

There are three general methods of feature selection :

Wrapper Method

Wrapper methods are based on greedy search algorithms as they evaluate all possible combinations of the features and select the combination that produces the best result for a specific machine learning algorithm.
A downside to this approach is that testing all possible combinations of the features can be computationally very expensive, particularly if the feature set is very large.
A downside is that these set of features may not be optimal for every other machine learning algorithm.

Wrapper methods for feature selection can be divided into three categories:

Step forward feature selection:
→ Step forward feature selection starts with the evaluation of each individual feature, and selects that which results in the best performing selected algorithm model.
→The best depends entirely on the defined evaluation criteria (AUC, prediction accuracy, RMSE, etc.).
→Forward selection is an iterative method in which we start with having no feature in the model.
→In each iteration, we keep adding the feature which best improves our model till an addition of a new variable does not improve the performance of the model.
* In the first step, The performance of the classifier is evaluated with respect to each feature individually,and the features that perform the best is selected out of all the features.
* In the second step, The first selected feature for the model is tried in combination with all the remaining features and the combination of two features that yield the best algorithm performance is selected.
* This process of evaluating and taking the best performing method out of different combination of features are repeated until the certain number of features are selected.

from mlxtend.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LinearRegression,LogisticRegression#FOR REGRESSION MODEL
feature_select = SequentialFeatureSelector(LinearRegression(),
                                           k_features=,
                                           forward=True,
                                           floating=False,
                                           scoring='r2',
                                           cv=)
feature_select.fit(x,y)
feature_select.k_feature_names_
#FOR CLASSIFICATION MODEL
feature_select = SequentialFeatureSelector(LogisticRegression(),
                                           k_features=,
                                           forward=True,
                                           floating=False,
                                           scoring='roc_auc',
                                           cv=)
feature_select.fit(x,y)
feature_select.k_feature_names_

Any model can be used instead of Linear and Logistic Regression.
k_features: is the numerical value that is to be passed for the number of features that needs to be selected.
scoring: needs the metrics to be passed for model evaluation.
cv: k-fold cross-validation

Step backwards feature selection:
→Step Backward Feature Selection is the reverse of Step Forward Feature Selection, and as you may have guessed starts with the entire set of features and works backward from there, removing features to find the optimal subset of a predefined size.
* In the first step one feature is removed, This removal is done by removing one feature and calculating the performance of the model.Basically the features are removed in a round robin manner.
* This process continues untill a specified number of features remain.

from mlxtend.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LinearRegression,LogisticRegression#FOR REGRESSION MODEL
feature_select = SequentialFeatureSelector(LinearRegression(),
                                           k_features=,
                                           forward=False,
                                           floating=False,
                                           scoring='r2',
                                           cv=)
feature_select.fit(x,y)
feature_select.k_feature_names_
#FOR CLASSIFICATION MODEL
feature_select = SequentialFeatureSelector(LogisticRegression(),
                                           k_features=,
                                           forward=False,
                                           floating=False,
                                           scoring='roc_auc',
                                           cv=)
feature_select.fit(x,y)
feature_select.k_feature_names_

Just taking the Forward in the function as false gives backward selection

Exhaustive feature selection:
→ It is the most greedy algorithm which aims to find the best performing feature subset.
→ It repeatedly creates models and keeps aside the best or the worst performing feature at each iteration.
→The performance of the algorithm is evaluated against all the possible combination of features in the dataset.
→It then ranks the features based on the order of their elimination.
→Since it checks for all the possible combination it is computationally very expensive and is mostly avoided to work with.

from mlxtend.feature_selection import ExhaustiveFeatureSelector
from sklearn.linear_model import LinearRegression,LogisticRegression#FOR REGRESSION MODELS
feature_select = ExhaustiveFeatureSelector(LinearRegression(),             
                                           min_features=,
                                           max_features=,
                                           scoring='R2',
                                           print_progress=True,
                                           cv=5)
feature_select = feature_select.fit(X, y)
print('Best accuracy score: %.2f' %feature_select.best_score_) 
print('Best subset (indices):', feature_select.best_idx_) 
print('Best subset (name):', feature_select.best_feature_names_)#FOR CLASSIFICATION MODELS
feature_select = ExhaustiveFeatureSelector(LogisticRegression(),             
                                           min_features=,
                                           max_features=,
                                           scoring='roc_auc',
                                           print_progress=True,
                                           cv=5)
feature_select = feature_select.fit(X, y)
print('Best accuracy score: %.2f' %feature_select.best_score_) 
print('Best subset (indices):', feature_select.best_idx_) 
print('Best subset (name):', feature_select.best_feature_names_)

Wrapper Methods are Computationally very expensive because the features are passed in round robin manner.
To perform the wrapper method make sure that you have encoded all the categorical features.

Coming up next week is EMBEDDED METHOD for Feature Selection.
HAPPY LEARNING!!!!

Like my article? Do give me a clap and share it, as that will boost my confidence. Also, I post new articles every sunday so stay connected for future articles of the basics of data science and machine learning series.

Also, do connect with me on LinkedIn.

Feature Selection For Dimensionality Reduction(Wrapper Method)

Need of Feature Selection:

Methods for Feature Selection

There are three general methods of feature selection :

Wrapper Method

Written by Abhigyan