Frequency Distribution

Introduction to Frequency Distribution

Imagine you have collected data on the heights of 100 students in a college. The raw data might look like a long list of numbers, such as 160 cm, 165 cm, 170 cm, and so on. While this raw data contains all the information, it is difficult to understand patterns or trends just by looking at it. This is where frequency distribution comes in.

Frequency distribution is a way to organize raw data into a meaningful format by grouping data into classes or intervals and showing how often each group occurs. This makes large data sets easier to interpret, analyze, and use for decision-making.

In this chapter, we will learn how to collect data, classify it, and create frequency distributions along with different types of frequencies and their graphical representations. These skills are essential for competitive exams and real-world data analysis.

Primary and Secondary Data

Before organizing data, it is important to understand where the data comes from. Data can be broadly classified into two types:

Primary Data: This is data collected firsthand by the researcher for a specific purpose. For example, measuring the heights of students in your college or conducting a survey on monthly expenses in INR.
Secondary Data: This is data collected by someone else and made available for use. Examples include census reports, government publications, or data from research articles.

Primary data is usually more reliable for specific studies because it is collected directly. Secondary data is useful when primary data collection is not feasible due to time or cost constraints.

Methods of Data Collection

Collecting accurate data is the first step in any statistical analysis. There are several methods to collect data:

graph TD    A[Data Collection] --> B[Surveys]    A --> C[Experiments]    A --> D[Observation]    A --> E[Existing Records]    B --> B1[Questionnaires]    B --> B2[Interviews]    C --> C1[Controlled Environment]    C --> C2[Field Experiments]    D --> D1[Direct Observation]    D --> D2[Participant Observation]

Each method has its own advantages and challenges. For example, surveys are quick but may have biased responses, while experiments provide controlled data but can be expensive.

Classification and Tabulation

Once data is collected, it needs to be organized for analysis. This involves two important steps:

Classification: Grouping data into classes or categories based on common characteristics. For example, grouping student heights into intervals like 150-159 cm, 160-169 cm, etc.
Tabulation: Arranging classified data into tables for clarity and easy reference.

Consider the following raw data of student heights (in cm):

158, 162, 165, 170, 172, 168, 160, 159, 175, 169, 161, 164

We can classify this data into class intervals and count the number of students in each interval:

Height (cm)	Frequency (Number of Students)
150 - 159	3
160 - 169	6
170 - 179	3

This table is a simple example of classification and tabulation.

Frequency Distribution

A frequency distribution summarizes data by showing the frequency of each class or category. It helps us understand how data points are spread across different ranges.

There are three important types of frequencies:

Absolute Frequency (f): The actual count of observations in each class.
Relative Frequency (rf): The proportion of the absolute frequency to the total number of observations. It is often expressed as a decimal or percentage.
Cumulative Frequency (CF): The running total of frequencies up to a certain class.

Here is a frequency distribution table showing all three types:

Class Interval (Height in cm)	Absolute Frequency (f)	Relative Frequency (rf)	Cumulative Frequency (CF)
150 - 159	3	0.25	3
160 - 169	6	0.50	9
170 - 179	3	0.25	12

In this example, the total number of students is 12. The relative frequency for the first class is \( \frac{3}{12} = 0.25 \), meaning 25% of students fall in the 150-159 cm range.

Cumulative and Relative Frequency

Cumulative frequency helps us understand how many observations fall below the upper limit of a class. For example, the cumulative frequency of 9 for the second class means 9 students have heights less than or equal to 169 cm.

Relative frequency provides a normalized view of data, useful for comparing classes when total observations differ. It is calculated as:

Relative Frequency

\[\text{Relative Frequency} = \frac{f_i}{N}\]

Proportion of each class frequency to total observations

\(f_i\) = Frequency of ith class

N = Total number of observations

Similarly, cumulative frequency is calculated by adding frequencies successively:

Cumulative Frequency

\[CF_i = \sum_{j=1}^{i} f_j\]

Running total of frequencies up to ith class

\(CF_i\) = Cumulative frequency up to ith class

\(f_j\) = Frequency of jth class

Graphical Representation

Graphs provide a visual way to understand frequency distributions. The three common graphs are:

Histogram: A bar graph representing class intervals on the x-axis and frequencies on the y-axis. Bars touch each other to show continuous data.
Frequency Polygon: A line graph connecting midpoints of class intervals plotted against frequencies.
Ogive: A curve showing cumulative frequency, useful for finding medians and quartiles.

Here is an example of a histogram based on the frequency distribution of student heights:

Worked Example 1: Constructing a Frequency Distribution Table Easy

Example 1: Constructing Frequency Distribution Easy

Given the following data representing the monthly electricity consumption (in kWh) of 15 households:

120, 135, 150, 145, 160, 155, 140, 130, 125, 170, 165, 150, 135, 140, 155

Organize this data into class intervals of width 15 starting from 120 and find the frequency of each class.

Step 1: Determine class intervals starting at 120 with width 15:

120 - 134
135 - 149
150 - 164
165 - 179

Step 2: Count the number of data points in each class:

120 - 134: 120, 130, 125 -> 3 values
135 - 149: 135, 145, 140, 135, 140 -> 5 values
150 - 164: 150, 160, 155, 155, 150 -> 5 values
165 - 179: 170, 165 -> 2 values

Step 3: Create the frequency distribution table:

Electricity Consumption (kWh)	Frequency
120 - 134	3
135 - 149	5
150 - 164	5
165 - 179	2

Answer: The frequency distribution table is as shown above.

Worked Example 2: Calculating Relative and Cumulative Frequencies Medium

Example 2: Calculating Relative and Cumulative Frequencies Medium

Using the frequency distribution table from Example 1, calculate the relative frequency and cumulative frequency for each class.

Step 1: Find total number of households:

Total frequency \( N = 3 + 5 + 5 + 2 = 15 \)

Step 2: Calculate relative frequency for each class using \( \text{Relative Frequency} = \frac{f_i}{N} \):

120 - 134: \( \frac{3}{15} = 0.20 \)
135 - 149: \( \frac{5}{15} = 0.33 \)
150 - 164: \( \frac{5}{15} = 0.33 \)
165 - 179: \( \frac{2}{15} = 0.13 \)

Step 3: Calculate cumulative frequency by adding frequencies successively:

120 - 134: 3
135 - 149: 3 + 5 = 8
150 - 164: 8 + 5 = 13
165 - 179: 13 + 2 = 15

Step 4: Complete the table:

Electricity Consumption (kWh)	Frequency (f)	Relative Frequency (rf)	Cumulative Frequency (CF)
120 - 134	3	0.20	3
135 - 149	5	0.33	8
150 - 164	5	0.33	13
165 - 179	2	0.13	15

Answer: The relative and cumulative frequencies are as shown in the table.

Worked Example 3: Drawing a Histogram from Frequency Data Medium

Example 3: Drawing a Histogram Medium

Using the frequency distribution table from Example 2, draw a histogram representing the electricity consumption of households.

Step 1: Label the x-axis with class intervals: 120-134, 135-149, 150-164, 165-179.

Step 2: Label the y-axis with frequencies from 0 to 6 (since max frequency is 5).

Step 3: Draw bars for each class interval with heights equal to their frequencies:

120-134: Height 3
135-149: Height 5
150-164: Height 5
165-179: Height 2

Step 4: Ensure bars touch each other as class intervals are continuous.

Answer: The histogram visually shows the distribution of electricity consumption.

Worked Example 4: Interpreting Frequency Distribution for Decision Making Hard

Example 4: Interpreting Frequency Distribution Hard

A shopkeeper records daily sales (in INR) over 20 days as follows:

4500, 4700, 4900, 5100, 5300, 5500, 5700, 5900, 6100, 6300, 6500, 6700, 6900, 7100, 7300, 7500, 7700, 7900, 8100, 8300

Classify the data into intervals of 500 INR starting from 4500 and create a frequency distribution table. Then answer:

Which sales range has the highest frequency?
What percentage of days had sales below 6500 INR?

Step 1: Define class intervals:

4500 - 4999
5000 - 5499
5500 - 5999
6000 - 6499
6500 - 6999
7000 - 7499
7500 - 7999
8000 - 8499

Step 2: Count frequencies:

4500 - 4999: 4500, 4700, 4900 -> 3
5000 - 5499: 5100, 5300, 5500 -> 3
5500 - 5999: 5700, 5900 -> 2
6000 - 6499: 6100, 6300 -> 2
6500 - 6999: 6500, 6700, 6900 -> 3
7000 - 7499: 7100, 7300 -> 2
7500 - 7999: 7500, 7700, 7900 -> 3
8000 - 8499: 8100, 8300 -> 2

Step 3: Create the frequency table:

Sales Range (INR)	Frequency
4500 - 4999	3
5000 - 5499	3
5500 - 5999	2
6000 - 6499	2
6500 - 6999	3
7000 - 7499	2
7500 - 7999	3
8000 - 8499	2

Step 4: Identify the sales range with highest frequency:

Multiple ranges have frequency 3: 4500-4999, 5000-5499, 6500-6999, 7500-7999. So, these ranges have the highest frequency.

Step 5: Calculate percentage of days with sales below 6500 INR:

Frequencies below 6500 INR = 3 + 3 + 2 + 2 = 10 days

Total days = 20

Percentage = \( \frac{10}{20} \times 100 = 50\% \)

Answer:

Highest frequency sales ranges: 4500-4999, 5000-5499, 6500-6999, 7500-7999 INR
50% of days had sales below 6500 INR

Worked Example 5: Problem on Grouped Data Frequency Distribution Hard

Example 5: Grouped Data Frequency Distribution Hard

The ages (in years) of 30 employees in a company are recorded as follows:

22, 25, 27, 24, 29, 31, 33, 35, 28, 26, 30, 34, 32, 36, 38, 40, 41, 39, 37, 35, 33, 31, 29, 28, 26, 24, 23, 22, 21, 20

Create a grouped frequency distribution table using class intervals of width 5 starting from 20.

Step 1: Define class intervals:

20 - 24
25 - 29
30 - 34
35 - 39
40 - 44

Step 2: Count frequencies:

20 - 24: 22, 24, 24, 23, 22, 21, 20 -> 7
25 - 29: 25, 27, 29, 28, 26, 29, 28, 26 -> 8
30 - 34: 31, 33, 35 (excluded), 30, 34, 32, 33, 31 -> 6 (Note: 35 excluded)
35 - 39: 35, 36, 38, 40 (excluded), 41 (excluded), 39, 37, 35 -> 5 (Note: 40, 41 excluded)
40 - 44: 40, 41 -> 2

Step 3: Create the frequency distribution table:

Age (years)	Frequency
20 - 24	7
25 - 29	8
30 - 34	6
35 - 39	7
40 - 44	2

Answer: The grouped frequency distribution table is as above.

Formula Bank

Relative Frequency

\[ \text{Relative Frequency} = \frac{f_i}{N} \]

where: \( f_i \) = frequency of ith class, \( N \) = total number of observations

Cumulative Frequency

\[ CF_i = \sum_{j=1}^{i} f_j \]

where: \( CF_i \) = cumulative frequency up to ith class, \( f_j \) = frequency of jth class

Class Width

\[ \text{Class Width} = \text{Upper Class Limit} - \text{Lower Class Limit} \]

where: Upper Class Limit and Lower Class Limit of a class interval

Tips & Tricks

Tip: Always check that class intervals are mutually exclusive and exhaustive.

When to use: When creating or verifying frequency distribution tables.

Tip: Use cumulative frequency to quickly find the median class in grouped data.

When to use: When calculating median or quartiles from frequency distribution.

Tip: Relative frequency can be converted to percentage by multiplying by 100.

When to use: When interpreting frequency data in percentage terms for better understanding.

Tip: Draw histograms with class intervals on x-axis and frequency on y-axis for visual clarity.

When to use: When representing grouped data graphically.

Tip: For quick tabulation, tally marks can be used before counting frequencies.

When to use: While organizing raw data into frequency tables.

Common Mistakes to Avoid

❌ Overlapping class intervals causing double counting

✓ Ensure class intervals are mutually exclusive, e.g., 10-19, 20-29, not 10-20, 20-30

Why: Students often forget to make intervals exclusive, leading to errors in frequency counts.

❌ Incorrect calculation of cumulative frequency by not adding frequencies sequentially

✓ Add frequencies cumulatively from the first class upwards without skipping any class

Why: Students may add frequencies randomly or miss a class, resulting in wrong cumulative values.

❌ Confusing relative frequency with cumulative frequency

✓ Remember relative frequency is proportion of each class frequency, cumulative frequency is running total

Why: Terminology similarity leads to mixing up concepts.

❌ Plotting histograms with gaps between bars

✓ Bars in histograms should touch each other as class intervals are continuous

Why: Students sometimes treat histograms like bar charts, causing misrepresentation.

❌ Using inconsistent units or mixing metric and imperial units in examples

✓ Always use metric units consistently as per syllabus requirements

Why: Mixing units confuses students and leads to calculation errors.

The Joy of Learning

Login

The Joy of Learning

Sign-up

The Joy of Learning

Forgot Password

Frequency Distribution

Introduction to Frequency Distribution

Primary and Secondary Data

Methods of Data Collection

Classification and Tabulation

Frequency Distribution

Cumulative and Relative Frequency

Relative Frequency

Cumulative Frequency

Graphical Representation

Worked Example 1: Constructing a Frequency Distribution Table Easy

Worked Example 2: Calculating Relative and Cumulative Frequencies Medium

Worked Example 3: Drawing a Histogram from Frequency Data Medium

Worked Example 4: Interpreting Frequency Distribution for Decision Making Hard

Worked Example 5: Problem on Grouped Data Frequency Distribution Hard

Formula Bank

Tips & Tricks

Common Mistakes to Avoid

Try Practice next.

Rank

eBook

Online Test Series + eBook

Book is added to your cart!