Imagine you are a librarian who needs to fetch books for many readers. If every time a reader requests a book, you have to go to a huge warehouse far away, it will take a lot of time. But if you keep the most popular books on a small shelf near you, you can quickly hand them out without delay. This is exactly how cache memory works in a computer system.
The Central Processing Unit (CPU) is the brain of the computer, performing millions of operations every second. However, the CPU is extremely fast, while the main memory (RAM) is comparatively slower. This speed difference creates a bottleneck, slowing down the overall system.
Cache memory acts as a small, very fast storage located close to the CPU. It stores frequently accessed data and instructions so that the CPU can retrieve them quickly, improving system performance significantly.
In this chapter, we will explore what cache memory is, how it is organized, the techniques used to manage it, and how to measure its performance.
What is Cache Memory?
Cache memory is a small-sized, high-speed memory located between the CPU and the main memory. It temporarily holds copies of data and instructions that the CPU is likely to reuse soon.
Why is Cache Needed?
The CPU operates at speeds much faster than the main memory. Without cache, the CPU would spend a lot of time waiting for data to be fetched from the slower main memory. Cache reduces this waiting time by providing faster access to frequently used data.
Cache vs Main Memory
Benefits of Cache Memory
Cache memory is organized in multiple levels to balance speed, size, and cost:
Each cache level stores data in blocks (also called cache lines). The block size is the amount of data transferred between cache and main memory in one operation.
Choosing the right cache size and block size is a trade-off between cost, speed, and hit rate (how often data is found in cache).
| Cache Level | Typical Size | Access Time (nanoseconds) | Location |
|---|---|---|---|
| L1 Cache | 32 KB - 64 KB | 1 - 3 ns | Inside CPU core |
| L2 Cache | 256 KB - 512 KB | 3 - 10 ns | Inside or near CPU core |
| L3 Cache | 2 MB - 16 MB | 10 - 20 ns | Shared across CPU cores |
When the CPU requests data, the cache must decide where to store or find that data. This is done using cache mapping techniques. There are three main types:
Each block of main memory maps to exactly one cache line. It is like having a fixed shelf slot for each book in the library.
Advantages: Simple and fast mapping.
Disadvantages: Can cause frequent conflicts if multiple blocks map to the same cache line.
A block can be placed in any cache line. The cache searches all lines to find a block.
Advantages: Flexible, fewer conflicts.
Disadvantages: More complex and slower to search.
Combines direct and associative mapping. Cache is divided into sets, each containing multiple lines. A block maps to one set but can be placed in any line within that set.
Advantages: Balances speed and flexibility.
Disadvantages: More complex than direct mapping.
graph TD A[Memory Address] --> B{Mapping Technique} B --> C[Direct Mapping] B --> D[Associative Mapping] B --> E[Set-Associative Mapping] C --> F[One Cache Line per Block] D --> G[Any Cache Line] E --> H[One Set, Multiple Lines]To evaluate cache effectiveness, we use several key metrics:
| Metric | Formula | Description |
|---|---|---|
| Hit Rate | \(\text{Hit Rate} = \frac{\text{Number of Hits}}{\text{Total Memory Accesses}}\) | Proportion of accesses found in cache |
| Miss Rate | \(\text{Miss Rate} = 1 - \text{Hit Rate}\) | Proportion of accesses not found in cache |
| Average Memory Access Time (AMAT) | \(\text{AMAT} = \text{Hit Time} + (\text{Miss Rate} \times \text{Miss Penalty})\) | Average time to access memory including cache misses |
When the CPU writes data, the cache must decide how and when to update main memory. This is controlled by write policies:
Every write to cache is immediately written to main memory. This keeps memory always updated but increases memory traffic.
Writes are made only to cache. Main memory is updated later when the cache block is replaced. This reduces memory traffic but requires more complex control.
When cache is full and a new block must be loaded, a replacement policy decides which block to remove:
These policies impact cache efficiency and complexity.
Step 1: Identify total memory accesses = 1000
Step 2: Number of hits = 800
Step 3: Calculate hit rate using formula:
\(\text{Hit Rate} = \frac{800}{1000} = 0.8\) or 80%
Step 4: Calculate miss rate:
\(\text{Miss Rate} = 1 - 0.8 = 0.2\) or 20%
Answer: Hit Rate = 80%, Miss Rate = 20%
Step 1: Given: Hit Time = 2 ns, Miss Penalty = 50 ns, Hit Rate = 0.9
Step 2: Calculate Miss Rate:
\(\text{Miss Rate} = 1 - 0.9 = 0.1\)
Step 3: Use AMAT formula:
\(\text{AMAT} = 2 + (0.1 \times 50) = 2 + 5 = 7 \text{ ns}\)
Answer: Average Memory Access Time = 7 ns
10110110, determine the tag, index, and block offset. Step 1: Calculate number of bits for block offset:
Block size = 4 bytes = \(2^2\) bytes, so block offset = 2 bits
Step 2: Calculate number of bits for index:
Number of cache lines = 16 = \(2^4\), so index = 4 bits
Step 3: Remaining bits are for tag:
Tag bits = Total bits - (Index bits + Block offset bits) = 8 - (4 + 2) = 2 bits
Step 4: Break down the address 10110110:
10110110Answer: Tag = 10, Index = 1101 (decimal 13), Block Offset = 10
11010110, determine the set number and tag. Step 1: Calculate block offset bits:
Block size = 4 bytes = \(2^2\), so block offset = 2 bits
Step 2: Calculate set index bits:
Number of sets = 8 = \(2^3\), so set index = 3 bits
Step 3: Calculate tag bits:
Tag bits = Total bits - (Set index + Block offset) = 8 - (3 + 2) = 3 bits
Step 4: Break down the address 11010110:
11010110Step 5: Convert set index to decimal:
\(101_2 = 5\)
Answer: Set Number = 5, Tag = 110
graph TD A[Memory Address 11010110] --> B[Tag: 110] A --> C[Set Index: 101 (Set 5)] A --> D[Block Offset: 10] C --> E{Check Set 5} E --> F[Compare Tag with 110 in Set 5] F --> G{Hit or Miss} Step 1: Write-Through Example:
Every time the CPU writes data to cache, it immediately writes to main memory.
If the CPU writes 100 times, there are 100 memory write operations.
Step 2: Write-Back Example:
CPU writes only to cache. Main memory is updated only when the cache block is replaced.
If the CPU writes 100 times to the same block, only 1 write to main memory occurs when the block is replaced.
Step 3: Impact on Performance:
Answer: Write-back policy reduces memory traffic by delaying writes to main memory, unlike write-through which writes immediately, causing more memory operations.
When to use: Quickly calculate one metric if the other is known.
When to use: While breaking down memory addresses for cache mapping problems.
When to use: Solving direct mapped cache problems quickly.
When to use: Mapping addresses in set-associative caches efficiently.
When to use: Evaluating cache write policies in performance questions.
Progress tracking is paywalled — subscribe to mark subtopics as understood and save your streak.
Go to practice →