# Calculate percentage for apples
5/18*100Basic Statistics I: Percents
A percentage is a number or ratio expressed as a fraction of 100. We’ll do some examples together to learn how to calculate percentages.
Example 1: For a basket of 18 fruits, there are 5 apples, 3 bananas, 6 peaches, and 4 oranges.
What percentage of fruits are apples?
What percentage of fruits are oranges and peaches?
# Calculate percentage for oranges and peaches
(4+6)/18*100Example 2: Let’s learn to calculate percentages by using real world data. We will work with a dataset of Ames, Iowa housing prices.
# Import the fetch_openml method
from sklearn.datasets import fetch_openml
housing = fetch_openml(name="house_prices", as_frame=True, parser="auto")# Import pandas, so that we can work with the data frame version of the Ames housing data
import pandas as pd# Load the dataset of house prices in Ames, and convert to
# a data frame format so it's easier to view and process
ames_df = pd.DataFrame(housing['data'], columns = housing['feature_names'])
ames_df['SalePrice'] = housing.target
ames_dfThe SaleCondition column lists the condition of the house sale:
Normal: Normal SaleAbnorml: Abnormal Sale - trade, foreclosure, short saleAdjLand: Adjoining Land PurchaseAlloca: Allocation - two linked properties with separate deeds, typically condo with a garage unitFamily: Sale between family membersPartial: Home was not completed when last assessed (associated with New Homes)
What percentage of the houses were sold normally? We’ll see how to do this using the query method AND using boolean indexing.
# Determine number of tracts that bound the Charles River two ways:
# (1) with the query function
num_normal = len(ames_df.query("SaleCondition == 'Normal'"))
num_normal# (2) using boolean indexing
num_normal = sum(ames_df["SaleCondition"] == "Normal")
num_normalHow do these two methods give the same answer?
# Determine the total number of houses in the dataset
total_num = len(ames_df)
# Now calculate the percentage of houses sold normally.
num_normal/total_num*100What percentage of houses have a price less than $200,000?
# Determine number of houses that cost less than $200,000
num_cost_less_200k = sum(ames_df["SalePrice"] < 200000)
# Calculate the percentage of houses that cost less than $200k.
num_cost_less_200k/total_num*100What percentage of houses have a sale price between $200,000 and $500,000?
# Make an array of booleans with cost greater than $200,000 AND less than $500,000
between_200k_and_500k = (ames_df["SalePrice"] > 200000) & (ames_df["SalePrice"] < 500000)
# Determine number of houses that cost between $200,000 and $500,000
num_between_200k_and_500k = sum(between_200k_and_500k)
# Calculate the percentage of houses between $200,000 and $500,000
num_between_200k_and_500k/total_num*100Good work! You just learned about how to calculate percentages in Python!