This article is all about the end-to-end project of News Classifier.

App has been built using **streamlit**, containerized with **Docker** and deployed on **AWS using Fargate**.

This article aims to give a complete walkthrough of the process.

For this, we will be using the data from Kaggle. **LINK**

For this, you can use jupyter notebook or Colab.

I personally recommend Colab as most of the packages are already installed.

`!pip install autocorrect`

import sys

!{sys.executable} -m pip install contractions

!pip install zeugma

`import re`

# --------------------------------------------------------------

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

# --------------------------------------------------------------

import nltk

from nltk.stem…

If you’ve been learning data science or been in the field for some time now.

You might have build tons of Classification models and checked the performance of the model with different metrics.

* Recall

* Accuracy

* F1-Score

We know, that when we have **imbalanced data** accuracy is not the metrics we should be looking at as it may be misleading, instead we should account for F1-Score when our dataset is imbalanced.

But will F1 really going to help?

ROC curves are one of the best methods for comparing the model’s goodness.

Also, check out my article on

Calculating Accuracy of an ML Model.

- The…

- What are Ensemble Methods?
- Intuition Behind Ensemble Methods!
- Different Ensemble Methods

* Bagging

→Intuition behind Bagging

* Boosting

→Intuition behind Boosting

* Stacking

→Intuition behind Stacking

* Bucket of models

- Ensemble methods are techniques that create multiple models and then combine them to produce improved results.
*This approach allows the production of better predictive performance compared to a single model.*- Ensemble methods usually produce more accurate solutions than a single model would. This has been the case in many machine learning competitions, where the winning solutions used ensemble methods.

In Machine Learning most of the algorithms work on the assumption of the normal distribution of the data.

However not all machine learning algorithms make such assumptions to know beforehand the type of data distribution it will work on but learns it directly from the data used for training.

- What is the need for and Importance of Gaussian Distribution?

→ What is Gaussian Distribution?

→ Need for Normal Distribution?

→ Importance of Normality in Machine Learning! - Need for Data Transformation!!
- Importance of Data Distribution Transformation.
- Different methods to Transform the Distribution.

→ The ladder of powers.

→ Box-Cox Transformation Method…

One of the many problems with the real world machine learning classification problems is the issue of the imbalanced data.

Imbalanced data means when the classes present in our data disproportionate, Meaning, the ratio of each class differs where one of the class are majorly present in the dataset and the other is minorly present.

- A
**decision tree**is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. **Decision tree learning**is one of the predictive modelling approaches used in statistics and machine learning. It uses a decision tree to go from observations about an item (represented in the branches) to conclusions about the item’s target value (represented in the leaves).- Decision Trees are a non-parametric supervised learning method used for both classification and regression tasks.
- The goal is to create a model that predicts the value of a target variable…

In my previous articles we took an overview of **Linear Regression** and **Logistic Regression**.

Let’s see another algorithm in the Regression Family.

- What is Polynomial Regression?
- Assumptions of Polynomial Regression.
- Why do we need Polynomial Regression?
- How to find the right degree of the Polynomial Equation?
- Math Behind Polynomial Equation.
- Cost Function of Polynomial Regression.
- Polynomial Regression with Gradient Descent.

In my previous Blog, I tried explaining about Linear Regression and how it works.Let’s See why Logistic Regression is one of the important topic to understand.

Here’s the **link to my previous article on Linear Regression** in case you missed it.

- What is Logistic Regression?
- Types of Logistic Regression.
- Assumptions of Logistic Regression.
- Why not Linear Regression for Classification?
- The Logistic Model.
- Interpretation of the co-efficients.
- Odds Ratio and Logit
- Decision Boundary.
- Cost Function of Logistic Regression.
- Gradient Descent in Logistic Regression.
- Evaluating the Logistic Regression Model.

Let’s get Started

Everyone new to the field of data science or machine learning,often starts their journey by learning the Linear Models of the vast set of Algorithm’s available.

So,Let’s Start!!!

- What is Linear Regression?
- Assumptions of Linear Regression.
- Types of Linear Regression?
- Understanding Slopes and Intercepts.
- How does a linear Regression Work?
- What is a Cost Function?
- Linear Regression with Gradient Descent.
- Interpreting the Regression Results.

Linear Regression is a statistical supervised learning technique to predict the quantitative variable by forming a linear relationship with one or more independent features.**It helps determine:**

→ If a independent variable does a good job in…

As we know Data Preprocessing is a very important part of any Machine Learning lifecycle. Most of the Algorithm’s expect the data passed on to be of a certain scale.That is where the part of feature scaling comes to play.*Feature scaling is a method used to scale the range of independent variables or features of data,so that the features comes down to the same range in order to avoid any kind of bias in the modelling.*

- The range of values of raw data varies widely, in some machine learning algorithms, functions will not work properly without normalization.

FOR EXAMPLE:

Many…

An electronics and communication engineer with passion towards data science,I write articles for people like me to understand things in laymen terms.