Imagine you have collected data on the heights of 100 students in a college. The raw data might look like a long list of numbers, such as 160 cm, 165 cm, 170 cm, and so on. While this raw data contains all the information, it is difficult to understand patterns or trends just by looking at it. This is where frequency distribution comes in.
Frequency distribution is a way to organize raw data into a meaningful format by grouping data into classes or intervals and showing how often each group occurs. This makes large data sets easier to interpret, analyze, and use for decision-making.
In this chapter, we will learn how to collect data, classify it, and create frequency distributions along with different types of frequencies and their graphical representations. These skills are essential for competitive exams and real-world data analysis.
Before organizing data, it is important to understand where the data comes from. Data can be broadly classified into two types:
Primary data is usually more reliable for specific studies because it is collected directly. Secondary data is useful when primary data collection is not feasible due to time or cost constraints.
Collecting accurate data is the first step in any statistical analysis. There are several methods to collect data:
graph TD A[Data Collection] --> B[Surveys] A --> C[Experiments] A --> D[Observation] A --> E[Existing Records] B --> B1[Questionnaires] B --> B2[Interviews] C --> C1[Controlled Environment] C --> C2[Field Experiments] D --> D1[Direct Observation] D --> D2[Participant Observation]
Each method has its own advantages and challenges. For example, surveys are quick but may have biased responses, while experiments provide controlled data but can be expensive.
Once data is collected, it needs to be organized for analysis. This involves two important steps:
Consider the following raw data of student heights (in cm):
158, 162, 165, 170, 172, 168, 160, 159, 175, 169, 161, 164
We can classify this data into class intervals and count the number of students in each interval:
| Height (cm) | Frequency (Number of Students) |
|---|---|
| 150 - 159 | 3 |
| 160 - 169 | 6 |
| 170 - 179 | 3 |
This table is a simple example of classification and tabulation.
A frequency distribution summarizes data by showing the frequency of each class or category. It helps us understand how data points are spread across different ranges.
There are three important types of frequencies:
Here is a frequency distribution table showing all three types:
| Class Interval (Height in cm) | Absolute Frequency (f) | Relative Frequency (rf) | Cumulative Frequency (CF) |
|---|---|---|---|
| 150 - 159 | 3 | 0.25 | 3 |
| 160 - 169 | 6 | 0.50 | 9 |
| 170 - 179 | 3 | 0.25 | 12 |
In this example, the total number of students is 12. The relative frequency for the first class is \( \frac{3}{12} = 0.25 \), meaning 25% of students fall in the 150-159 cm range.
Cumulative frequency helps us understand how many observations fall below the upper limit of a class. For example, the cumulative frequency of 9 for the second class means 9 students have heights less than or equal to 169 cm.
Relative frequency provides a normalized view of data, useful for comparing classes when total observations differ. It is calculated as:
Similarly, cumulative frequency is calculated by adding frequencies successively:
Graphs provide a visual way to understand frequency distributions. The three common graphs are:
Here is an example of a histogram based on the frequency distribution of student heights:
Given the following data representing the monthly electricity consumption (in kWh) of 15 households:
120, 135, 150, 145, 160, 155, 140, 130, 125, 170, 165, 150, 135, 140, 155
Organize this data into class intervals of width 15 starting from 120 and find the frequency of each class.
Step 1: Determine class intervals starting at 120 with width 15:
Step 2: Count the number of data points in each class:
Step 3: Create the frequency distribution table:
| Electricity Consumption (kWh) | Frequency |
|---|---|
| 120 - 134 | 3 |
| 135 - 149 | 5 |
| 150 - 164 | 5 |
| 165 - 179 | 2 |
Answer: The frequency distribution table is as shown above.
Using the frequency distribution table from Example 1, calculate the relative frequency and cumulative frequency for each class.
Step 1: Find total number of households:
Total frequency \( N = 3 + 5 + 5 + 2 = 15 \)
Step 2: Calculate relative frequency for each class using \( \text{Relative Frequency} = \frac{f_i}{N} \):
Step 3: Calculate cumulative frequency by adding frequencies successively:
Step 4: Complete the table:
| Electricity Consumption (kWh) | Frequency (f) | Relative Frequency (rf) | Cumulative Frequency (CF) |
|---|---|---|---|
| 120 - 134 | 3 | 0.20 | 3 |
| 135 - 149 | 5 | 0.33 | 8 |
| 150 - 164 | 5 | 0.33 | 13 |
| 165 - 179 | 2 | 0.13 | 15 |
Answer: The relative and cumulative frequencies are as shown in the table.
Using the frequency distribution table from Example 2, draw a histogram representing the electricity consumption of households.
Step 1: Label the x-axis with class intervals: 120-134, 135-149, 150-164, 165-179.
Step 2: Label the y-axis with frequencies from 0 to 6 (since max frequency is 5).
Step 3: Draw bars for each class interval with heights equal to their frequencies:
Step 4: Ensure bars touch each other as class intervals are continuous.
Answer: The histogram visually shows the distribution of electricity consumption.
A shopkeeper records daily sales (in INR) over 20 days as follows:
4500, 4700, 4900, 5100, 5300, 5500, 5700, 5900, 6100, 6300, 6500, 6700, 6900, 7100, 7300, 7500, 7700, 7900, 8100, 8300
Classify the data into intervals of 500 INR starting from 4500 and create a frequency distribution table. Then answer:
Step 1: Define class intervals:
Step 2: Count frequencies:
Step 3: Create the frequency table:
| Sales Range (INR) | Frequency |
|---|---|
| 4500 - 4999 | 3 |
| 5000 - 5499 | 3 |
| 5500 - 5999 | 2 |
| 6000 - 6499 | 2 |
| 6500 - 6999 | 3 |
| 7000 - 7499 | 2 |
| 7500 - 7999 | 3 |
| 8000 - 8499 | 2 |
Step 4: Identify the sales range with highest frequency:
Multiple ranges have frequency 3: 4500-4999, 5000-5499, 6500-6999, 7500-7999. So, these ranges have the highest frequency.
Step 5: Calculate percentage of days with sales below 6500 INR:
Frequencies below 6500 INR = 3 + 3 + 2 + 2 = 10 days
Total days = 20
Percentage = \( \frac{10}{20} \times 100 = 50\% \)
Answer:
The ages (in years) of 30 employees in a company are recorded as follows:
22, 25, 27, 24, 29, 31, 33, 35, 28, 26, 30, 34, 32, 36, 38, 40, 41, 39, 37, 35, 33, 31, 29, 28, 26, 24, 23, 22, 21, 20
Create a grouped frequency distribution table using class intervals of width 5 starting from 20.
Step 1: Define class intervals:
Step 2: Count frequencies:
Step 3: Create the frequency distribution table:
| Age (years) | Frequency |
|---|---|
| 20 - 24 | 7 |
| 25 - 29 | 8 |
| 30 - 34 | 6 |
| 35 - 39 | 7 |
| 40 - 44 | 2 |
Answer: The grouped frequency distribution table is as above.
When to use: When creating or verifying frequency distribution tables.
When to use: When calculating median or quartiles from frequency distribution.
When to use: When interpreting frequency data in percentage terms for better understanding.
When to use: When representing grouped data graphically.
When to use: While organizing raw data into frequency tables.
Progress tracking is paywalled — subscribe to mark subtopics as understood and save your streak.
Go to practice →