**→ Introduction→ What is Cross-Validation?→ Different Types of Cross-Validation**

2. K-Folds Method

3. Repeated K-Folds Method

4. Stratified K-Folds Method

5. Group K-Folds Method

6. Shuffle Split Method

7. Stratified Shuffle Split Method

8. Group Shuffle Split Method

9. Leave-One-Out Method

10. Leave-P-Out Method

11. Leave-One-Group-Out Method

12. Leave-P-Group-Out Method

13. Time Series Cross-Validation Method

14. Blocked Cross-Validation Method

15. Nested Cross-Validation Method

→ Conclusion

→ Reference

Imagine building a model on a dataset and it fails on unseen data.

We cannot just fit the model on our training data and lay back hoping it will…

**→ Importance of Hyper-Parameter Tuning!→ Hyperparameter Tuning/Optimization→ Defining Functions→ Checking Performance on Base Model→ Different Hyperparameter Tuning Methods**

2. RandomSearch

3. Successive Halving

4. Bayesian Optimizers

5. Manual Search

→ Conclusion

Hyperparameters are the soul of any model present in today’s ML world. The values of Hyperparameters needs to be passed manually as they cannot be learned, which then controls the whole Learning Process.

Hyperparameters are needed to be set before fitting the data in order to get a more robust and optimized model.

- The goal of any…

This article is all about the end-to-end project of News Classifier.

App has been built using **streamlit**, containerized with **Docker** and deployed on **AWS using Fargate**.

This article aims to give a complete walkthrough of the process.

For this, we will be using the data from Kaggle. **LINK**

For this, you can use jupyter notebook or Colab.

I personally recommend Colab as most of the packages are already installed.

`!pip install autocorrect`

import sys

!{sys.executable} -m pip install contractions

!pip install zeugma

`import re`

# --------------------------------------------------------------

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

# --------------------------------------------------------------

import nltk

from nltk.stem…

If you’ve been learning data science or been in the field for some time now.

You might have build tons of Classification models and checked the performance of the model with different metrics.

* Recall

* Accuracy

* F1-Score

We know, that when we have **imbalanced data** accuracy is not the metrics we should be looking at as it may be misleading, instead we should account for F1-Score when our dataset is imbalanced.

But will F1 really going to help?

ROC curves are one of the best methods for comparing the model’s goodness.

Also, check out my article on

Calculating Accuracy of an ML Model.

- The…

- What are Ensemble Methods?
- Intuition Behind Ensemble Methods!
- Different Ensemble Methods

* Bagging

→Intuition behind Bagging

* Boosting

→Intuition behind Boosting

* Stacking

→Intuition behind Stacking

* Bucket of models

- Ensemble methods are techniques that create multiple models and then combine them to produce improved results.
*This approach allows the production of better predictive performance compared to a single model.*- Ensemble methods usually produce more accurate solutions than a single model would. This has been the case in many machine learning competitions, where the winning solutions used ensemble methods.

In Machine Learning most of the algorithms work on the assumption of the normal distribution of the data.

However not all machine learning algorithms make such assumptions to know beforehand the type of data distribution it will work on but learns it directly from the data used for training.

- What is the need for and Importance of Gaussian Distribution?

→ What is Gaussian Distribution?

→ Need for Normal Distribution?

→ Importance of Normality in Machine Learning! - Need for Data Transformation!!
- Importance of Data Distribution Transformation.
- Different methods to Transform the Distribution.

→ The ladder of powers.

→ Box-Cox Transformation Method…

One of the many problems with the real world machine learning classification problems is the issue of the imbalanced data.

Imbalanced data means when the classes present in our data disproportionate, Meaning, the ratio of each class differs where one of the class are majorly present in the dataset and the other is minorly present.

- A
**decision tree**is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. **Decision tree learning**is one of the predictive modelling approaches used in statistics and machine learning. It uses a decision tree to go from observations about an item (represented in the branches) to conclusions about the item’s target value (represented in the leaves).- Decision Trees are a non-parametric supervised learning method used for both classification and regression tasks.
- The goal is to create a model that predicts the value of a target variable…

In my previous articles we took an overview of **Linear Regression** and **Logistic Regression**.

Let’s see another algorithm in the Regression Family.

- What is Polynomial Regression?
- Assumptions of Polynomial Regression.
- Why do we need Polynomial Regression?
- How to find the right degree of the Polynomial Equation?
- Math Behind Polynomial Equation.
- Cost Function of Polynomial Regression.
- Polynomial Regression with Gradient Descent.

In my previous Blog, I tried explaining about Linear Regression and how it works.Let’s See why Logistic Regression is one of the important topic to understand.

Here’s the **link to my previous article on Linear Regression** in case you missed it.

- What is Logistic Regression?
- Types of Logistic Regression.
- Assumptions of Logistic Regression.
- Why not Linear Regression for Classification?
- The Logistic Model.
- Interpretation of the co-efficients.
- Odds Ratio and Logit
- Decision Boundary.
- Cost Function of Logistic Regression.
- Gradient Descent in Logistic Regression.
- Evaluating the Logistic Regression Model.

Let’s get Started