Label Encoding and One Hot Encoding

Photo by Bailey Granneman on Unsplash

EDA is one of the important steps in any Data Science journey. We have a common doubt while performing EDA; on how to handle the categorical variables. Most of the ML algorithms support numerical variables. Hence, it is required to convert the object or categorical variables into numeric form that the algorithm can understand.

We will discuss the most widely used techniques for the categorical variable’s conversion.

1. Label Encoding

2. One-hot Encoding

Let’s explore the same with the below dataset of Passenger.

This dataset has various columns i.e. name, gender, age, package, TicketCost and Destination.


Pandas Data Frame is widely used while doing the EDA-analysis on the data. I was working on few of the datasets and explore few quick functions on the pandas. Let’s play, explore and learn these quickly!!

Today, I m going to use a very simple dataset of Employees which look like as given below. The attributes are ID, Name, Role, Salary, Dept and Sex. There are 10 rows and 6 columns or attributes.

Employees

Firstly read the above dataset in a data frame quickly.

Emp_Ds=pd.read_csv(“Emp.csv”)

  1. Group by: It is used to aggregate your data-frame. …


Photo by Luke Chesser on Unsplash

I was working on one of the interesting dataset of Indian food and what could be better than visualization to explore it. We will work on the data analysis of Indian food and plot various graphs using seaborn, matplotlib and word cloud.

Data

The dataset can be download from Indian Food 101 | Kaggle. The dataset is small, it has 255 rows and 9 columns. A quick glance on the first few records of the data set.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store