Data is all around us, from the money people earn to the things companies sell to the grades you get in school. This data needs to be gathered and arranged in a way that allows us to understand it better and gain useful insights.
That’s where statistics comes in. Statistics is all about collecting, organizing, figuring out, and showing data in a clear way. To begin our journey into statistics, we’ll start by learning about three important concepts: the mean, median, and mode. These are called measures of central tendency, and they help us describe the center or typical value of a set of data.
Table of Contents
What is Statistics?
Statistics is a branch of mathematics that involves collecting, organizing, interpreting, and presenting data. Data can be any information like age, income, favorite color, etc.
Two Main Branches
- Descriptive statistics: This involves creating graphs, tables, charts and summarizing data to provide insights. It helps describe and understand the features of a specific data set.
- Inferential statistics: This is about using data from a sample to make inferences about a larger population. It involves estimating parameters and testing hypotheses. This is the more complex branch.
Focus on Descriptive Statistics
In this overview, we will focus on descriptive statistics, specifically the measures of central tendency. These measures help provide a summary of where the “middle” or “center” of a dataset is located.
The three main measures of central tendency are:
- Mean (average)
- Median (middle value)
- Mode (most frequent value)
Understanding these measures helps get a sense of a typical value in a dataset and is a starting point for further analysis.
The goal of descriptive statistics is to summarize and organize data in a meaningful way before drawing further insights. With a good grasp of the basics of statistics and these measures, you can be more confident in understanding data and making decisions.
Measures of Central Tendency
Measures of central tendency help summarize a large dataset using a single value that represents the center or middle of the data distribution. This is useful when dealing with a large amount of data, such as the algebra test scores of 500 students.
The Three Measures of Central Tendency
- Mean (Average): The mean is calculated by adding up all the values in a dataset and dividing by the total number of values. It provides an overall average of the data.
- Median: The median is the middle value in a dataset when the values are arranged in ascending or descending order. If there is an even number of values, the median is the average of the two middle values. The median is less affected by extreme values compared to the mean.
- Mode: The mode is the value that appears most frequently in a dataset. A dataset can have one mode (unimodal), two modes (bimodal), or more than two modes (multimodal). The mode is useful for identifying the most common value in a dataset.
Example
Let’s say you have the algebra test scores of 10 students:85, 92, 78, 85, 90, 85, 83, 95, 88, 91
To find the mean, add up all the scores and divide by 10:(85 + 92 + 78 + 85 + 90 + 85 + 83 + 95 + 88 + 91) ÷ 10 = 87.2
To find the median, arrange the scores in ascending order:78, 83, 85, 85, 85, 88, 90, 91, 92, 95
The median is the average of the two middle values (85 and 88): (85 + 88) ÷ 2 = 86.5
To find the mode, identify the value that appears most frequently:85
appears three times, so it is the mode.
By using these measures of central tendency, you can effectively summarize and interpret large datasets, such as the algebra test scores of 500 students.
Let’s examine each of these measures of central tendency in detail.
1. Mean
The mean, also known as the average, is a way to find the central value or typical value of a set of numbers.
To calculate the mean:
- Add up all the numbers in the set
- Divide the sum by how many numbers there are
The mean gives you an idea of what a typical or central value in a set of numbers looks like. It’s commonly used to describe data in a simple way.
Here’s a simplified explanation of calculating the mean:
Calculating the Mean
To find the mean of a set of numbers, follow these steps:
- Add up all the numbers
- Divide the sum by the total count of numbers
For example, if you have these quiz scores: 92, 85, 80, 91
- Add them up: 92 + 85 + 80 + 91 = 348
- Divide by the number of scores (4): 348 ÷ 4 = 87
So the mean score is 87.
Sample Problem 1: Mr. Santos’ Hardware Shop
Mr. Santos wants to know if he should continue his hardware shop. He will close it if his mean sales this week are under ₱5,000.
His daily sales:
- Monday: ₱3,200
- Tuesday: ₱5,000
- Wednesday: ₱6,300
- Thursday: ₱3,500
- Friday: ₱6,000
- Add sales: 3,200 + 5,000 + 6,300 + 3,500 + 6,000 = 24,000
- Divide by days (5): 24,000 ÷ 5 = 4,800
The mean is ₱4,800, less than ₱5,000. So Mr. Santos should close the shop.
Sample Problem 2: Luke’s Quiz Scores
Luke’s mean score on 5 statistics quizzes is 92. What’s the sum of his scores?
We know:
- Mean = Sum of scores ÷ Number of scores
- Luke’s mean is 92
- He has 5 scores
So: 92 = Sum of scores ÷ 5
To find the sum, multiply both sides by 5:
- 5 × 92 = 5 × (Sum ÷ 5)
- 460 = Sum
Therefore, the sum of Luke’s scores is 460.
Sample Problem 3: Fred’s Basketball Scores
Fred’s average score from his last 5 games is 21. In the 6th game he scored 17 points. What’s his new average?
To calculate:
1. Find total points from first 5 games
- Average × Number of games
- 21 × 5 = 105
2. Add 6th game points
- 105 + 17 = 122
3. Divide new total by 6 games
- 122 ÷ 6 = 20.33
Fred’s new average score is 20.33.
The mean is a useful average, but can be affected by very high or low “outlier” values. Understanding how to calculate it is an important math skill!
Outliers
Outliers are values that are much higher or lower compared to the other values in a set of data. For example, if your math exam scores are 45, 48, 47, 44, and 10, the score of 10 is an outlier because it is significantly lower than the other scores, which are all above 40.
How Outliers Affect the Mean
When calculating the mean (average) of a set of data, outliers can have a big impact. In the example above, the mean score is:
[45 + 48 + 47 + 44 + 10] ÷ 5 = 38.8
The single outlier value of 10 drastically lowers the mean, making it less representative of the majority of the scores.
Properties of the Mean
- The mean doesn’t have to be a value in the dataset: For example, the mean of {1, 2, 3, 8} is 3.5, which is not in the original set.
- The sum of deviations from the mean is always 0: Deviations are the differences between each value and the mean. In the set {1, 2, 3, 8}, the mean is 3.5. The deviations are:
- 1 – 3.5 = -2.5
- 2 – 3.5 = -1.5
- 3 – 3.5 = -0.5
- 8 – 3.5 = 4.5
- Adding the deviations: -2.5 + (-1.5) + (-0.5) + 4.5 = 0
- Adding or subtracting a constant from each value changes the mean by that amount: If we add 3 to each value in {1, 2, 3, 8}, we get {4, 5, 6, 11}. The new mean is 6.5, which is the original mean (3.5) plus 3.
2. Median
Finding the Median
- Arrange the numbers in ascending or descending order.
- If there is an odd number of values, the median is the middle number.
- If there is an even number of values, the median is the average of the two middle numbers.
Examples:
Odd number of values: 12, 17, 25, 32, 80
- Arranged in ascending order: 12, 17, 25, 32, 80
- The median is the middle number: 25
Even number of values: 10, 12, 17, 25, 32, 80
- Arranged in ascending order: 10, 12, 17, 25, 32, 80
- The two middle numbers are 17 and 25
- The median is the average of these two numbers: (17 + 25) ÷ 2 = 21
So, to find the median, remember to:
- Order the numbers
- Find the middle number (odd) or average the two middle numbers (even)
How to Find the Median Value in a Set of Numbers?
Odd Number of Observations
- Put the numbers in order from lowest to highest (or highest to lowest).
- Find the middle number. This is the median.
Example: In the set {3, 7, 2, 9, 1}, the ordered set is {1, 2, 3, 7, 9}. The middle number (3) is the median.
Even Number of Observations
- Put the numbers in order from lowest to highest (or highest to lowest).
- Find the two middle numbers.
- Calculate the average (mean) of those two middle numbers. This average is the median.
Example: In the set {4, 2, 9, 7, 1, 6}, the ordered set is {1, 2, 4, 6, 7, 9}. The two middle numbers are 4 and 6. The average of 4 and 6 is (4+6)/2 = 5. So the median is 5.
The key steps are:
- Order the numbers
- Find the middle number(s)
- If there are two middle numbers, average them
Sample Problem 1
The heights of eleven Grade 10 high school students are:
172 cm, 168 cm, 169 cm, 169 cm, 171 cm, 168 cm, 169 cm, 173 cm, 170 cm, 168 cm, 170 cm
Find the median height.
Solution
- Arrange the heights in ascending order: 168 cm, 168 cm, 168 cm, 169 cm, 169 cm, 169 cm, 170 cm, 170 cm, 171 cm, 172 cm, 173 cm
- Since there are 11 observations (an odd number), the median is the middle value (6th observation).
Therefore, the median height is 169 cm.
Sample Problem 2
The ages of eighteen physicists in a city are:
32, 45, 42, 31, 33, 33, 48, 38, 39, 42, 42, 41, 46, 38, 38, 39, 37, 39
Find the median age of the physicists.
Solution
- Arrange the ages in ascending order: 31, 32, 33, 33, 37, 38, 38, 38, 39, 39, 39, 41, 42, 42, 42, 45, 46, 48
- Since there are 18 observations (an even number), the median is the average of the two middle values (9th and 10th observations).
- The 9th and 10th observations are both 39.
- Calculate the average of the two middle values: (39 + 39) / 2 = 39
Therefore, the median age of the physicists is 39.
Some characteristics of the median
1. The median splits the data into two equal parts
The median is the middle value in a set of numbers. It divides the data into two equal parts, with half of the values below the median and half above it.
Example:
In the set {3, 4, 5, 6, 7}, the median is 5. There are two numbers below 5 (3 and 4) and two numbers above 5 (6 and 7).
2. The median is not strongly influenced by extreme values (outliers)
Unlike the mean (average), the median is not greatly affected by extremely high or low values in the data set.
Example:
Consider the set {3, 4, 5, 6, 7}. The median is 5.
If we replace 3 with a very low value, like 0.003, the set becomes {0.003, 4, 5, 6, 7}. The median is still 5.
If we replace 6 with a very high value, like 120, the set becomes {3, 4, 5, 7, 120}. The median remains 5.
In both cases, the extreme values (0.003 and 120) did not significantly change the median.
3. Mode
The mode is the value that appears most frequently in a set of numbers. It’s the number you see the most often.
For example, let’s say we have these numbers: 1, 2, 2, 3, 4, 5, 6, 7
The number 2 shows up twice, while all the other numbers only appear once. So, the mode of this set is 2.
Sometimes, a set of numbers can have more than one mode. For instance, consider: 1, 2, 2, 3, 3, 4, 5, 6, 7
In this case, both 2 and 3 appear twice, more than any other number. So, this set has two modes: 2 and 3.
- When a set has two modes, we call it bimodal.
- When a set has more than two modes, we call it multimodal.
Sample Problem 1
Let’s find the modal age (the age that appears most often) from this set of ages:
32, 55, 34, 36, 39, 39, 39, 40, 35, 30
Answer: The age 39 shows up three times, more than any other age. So, the modal age is 39.
Sample Problem 2
A survey asked college students about their favorite subjects. Here are the results:
Subject | Number of Students |
---|---|
Algebra and Trigonometry | 59 |
Introductory Economics | 57 |
Philippine History | 60 |
English for Academic Writing | 58 |
Physical Education | 60 |
Earth Science | 64 |
Answer: The subject chosen by the most students (64) is Earth Science. So, Earth Science is the mode of the survey results.
Note: The mode is Earth Science (the most common response), not 64 (the number of times that response appeared).
Sample Problem 3
This graph shows the ages of participants at a science conference:
Answer: The age groups with the most participants are 45-50 and 51-56. Since there are two modal age groups, this is a bimodal distribution. The modal age groups are 45-50 and 51-56.