Member-only story
Encoding Categorical Variables
Label Encoding and One Hot Encoding
EDA is one of the important steps in any Data Science journey. We have a common doubt while performing EDA; on how to handle the categorical variables. Most of the ML algorithms support numerical variables. Hence, it is required to convert the object or categorical variables into numeric form that the algorithm can understand.
We will discuss the most widely used techniques for the categorical variable’s conversion.
1. Label Encoding
2. One-hot Encoding
Let’s explore the same with the below dataset of Passenger.
This dataset has various columns i.e. name
, gender
, age
, package
, TicketCost
and Destination
.

The attributes Name
, Gender
, Package
and Destination
are object data type i.e. categorical type.
