# import seaborn, matplotlib
import seaborn as sns
import matplotlib.pyplot as plt
# set up inline figures
%matplotlib inline
Bar Charts and Histograms
Bar Charts
Bar charts are used to display how a categorical variable relates to a continuous variable. In bar charts the categorical varibale is displayed on the x-axis and the continuous variable is displayed on the y-axis.
- Categorical variables are variables with different categories or groups.
- Examples: gender, city
- Continuous variables are numeric variables.
- Examples: time, height, length
We will be using the titanic dataset in this example. Let’s load and preview it.
# read in titanic data
= sns.load_dataset("titanic")
titanic # preview data
titanic.head()
survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
Let’s say we want to compare the mean fare price across the three classes of tickets for all passengers.
# barplot of class vs fare
="class", y = 'fare', data=titanic) sns.barplot(x
Notice how seaborn
magically computes the mean fares and generates the plot exactly as we want without us even specifying!
What if we wanted to look at the data more granularly and further stratify each class
bar by the sex
variable? Based on what you know about seaborn
so far, how do you think we can do that?
# barplot of class vs fare stratified by sex
="class", y = 'fare', hue = "sex", data=titanic) sns.barplot(x
Histograms
Histograms are used to visualize the distribution of a continuous variable.
Let’s say we wanted to see how the age
was distributed across all passengers in our dataset. We can use the distplot
function to generate our histogram.
# histogram of age
'age'].dropna(), kde=False) sns.distplot(titanic[
We can change the number of bins used to plot our histogram to change the granularity of our distribution plot.
# histogram of age
'age'].dropna(), kde=False, bins=10) sns.distplot(titanic[
# histogram of age
'age'].dropna(), kde=False, bins=80) sns.distplot(titanic[
Unfortunately we can’t color our histograms by another variable, but we can compare the distributions of certain variables between subsets of our DataFrame by layering them.
# histogram of age for females
'sex == "female"')['age'].dropna(), kde=False, label="F")
sns.distplot(titanic.query('sex == "male"')['age'].dropna(), kde=False, label="M")
sns.distplot(titanic.query( plt.legend()
Count Plots
Count plots can be thought of as histograms for categorical variables.
Let’s say we wanted to visualize how many passengers there were in each class
.
# count plot of class
="class", data=titanic) sns.countplot(x
Now, let’s stratify each class by the sex
variable using color. By now you’re an expert in this!
# stratify class by sex variable
="class", hue = "sex", data=titanic) sns.countplot(x
As always we can change the color palette:
# change color palette
="class", hue = "sex", palette = "Set3", data=titanic) sns.countplot(x
In this lesson you learned: * How to create barplots in seaborn * How to stratify barplots by another variable using color (hue
) * How to create histograms in seaborn * Changing the granularity of the histograms (bins
) * How to create count plots in seaborn * How to stratify count plots by another variable using color (hue
)