# Calculate percentage for apples
5/18*100
Basic Statistics I: Percents
A percentage is a number or ratio expressed as a fraction of 100. We’ll do some examples together to learn how to calculate percentages.
Example 1: For a basket of 18 fruits, there are 5 apples, 3 bananas, 6 peaches, and 4 oranges.
What percentage of fruits are apples?
What percentage of fruits are oranges and peaches?
# Calculate percentage for oranges and peaches
4+6)/18*100 (
Example 2: Let’s learn to calculate percentages by using real world data. We will work with a dataset of Ames, Iowa housing prices.
# Import the fetch_openml method
from sklearn.datasets import fetch_openml
= fetch_openml(name="house_prices", as_frame=True, parser="auto") housing
# Import pandas, so that we can work with the data frame version of the Ames housing data
import pandas as pd
# Load the dataset of house prices in Ames, and convert to
# a data frame format so it's easier to view and process
= pd.DataFrame(housing['data'], columns = housing['feature_names'])
ames_df 'SalePrice'] = housing.target
ames_df[ ames_df
The SaleCondition
column lists the condition of the house sale:
Normal
: Normal SaleAbnorml
: Abnormal Sale - trade, foreclosure, short saleAdjLand
: Adjoining Land PurchaseAlloca
: Allocation - two linked properties with separate deeds, typically condo with a garage unitFamily
: Sale between family membersPartial
: Home was not completed when last assessed (associated with New Homes)
What percentage of the houses were sold normally? We’ll see how to do this using the query method AND using boolean indexing.
# Determine number of tracts that bound the Charles River two ways:
# (1) with the query function
= len(ames_df.query("SaleCondition == 'Normal'"))
num_normal num_normal
# (2) using boolean indexing
= sum(ames_df["SaleCondition"] == "Normal")
num_normal num_normal
How do these two methods give the same answer?
# Determine the total number of houses in the dataset
= len(ames_df)
total_num
# Now calculate the percentage of houses sold normally.
/total_num*100 num_normal
What percentage of houses have a price less than $200,000?
# Determine number of houses that cost less than $200,000
= sum(ames_df["SalePrice"] < 200000)
num_cost_less_200k
# Calculate the percentage of houses that cost less than $200k.
/total_num*100 num_cost_less_200k
What percentage of houses have a sale price between $200,000 and $500,000?
# Make an array of booleans with cost greater than $200,000 AND less than $500,000
= (ames_df["SalePrice"] > 200000) & (ames_df["SalePrice"] < 500000)
between_200k_and_500k
# Determine number of houses that cost between $200,000 and $500,000
= sum(between_200k_and_500k)
num_between_200k_and_500k
# Calculate the percentage of houses between $200,000 and $500,000
/total_num*100 num_between_200k_and_500k
Good work! You just learned about how to calculate percentages in Python!