EDA is one of the important steps in any Data Science journey. We have a common doubt while performing EDA; on how to handle the categorical variables. Most of the ML algorithms support numerical variables. Hence, it is required to convert the object or categorical variables into numeric form that the algorithm can understand.
We will discuss the most widely used techniques for the categorical variable’s conversion.
1. Label Encoding
2. One-hot Encoding
Let’s explore the same with the below dataset of Passenger.
This dataset has various columns i.e.
Pandas Data Frame is widely used while doing the EDA-analysis on the data. I was working on few of the datasets and explore few quick functions on the pandas. Let’s play, explore and learn these quickly!!
Today, I m going to use a very simple dataset of Employees which look like as given below. The attributes are ID, Name, Role, Salary, Dept and Sex. There are 10 rows and 6 columns or attributes.
Firstly read the above dataset in a data frame quickly.
I was working on one of the interesting dataset of Indian food and what could be better than visualization to explore it. We will work on the data analysis of Indian food and plot various graphs using seaborn, matplotlib and word cloud.
The dataset can be download from Indian Food 101 | Kaggle. The dataset is small, it has 255 rows and 9 columns. A quick glance on the first few records of the data set.