295. Median Finder
Find Median from Data Stream
A key application of heaps: tracking not just real-time minimum or maximum, but also real-time median in a data stream.
Definition of Data Stream Median
- Length is odd: after sorting, the middle element. Like, median of is
- Length is even: after sorting, the average of two middle elements. Like, median of is
With definition in place, the question becomes:
as new numbers keep arriving, how do we always know the current median?
What Sorting Implies
A sorted sequence can naturally be split into a lower half and an upper half.
"Maximum" of lower half "minimum" of upper half.
So we prepare a min_heap and a max_heap:
min_heap maintains upper half, max_heap maintains lower half.
Two conditions must maintain:
I. Either both heaps have the same size, or min_heap has exactly one more element than max_heap
When the data stream has an odd number of elements, the median lives in min_heap,
which is also when min_heap has one more element.
This is simply my preference to place median in upper half — lower half works too, just be consistent.
II. Maximum of max_heap minimum of min_heap
If this isn't satisfied, keep swapping between two heaps until it is.
When a New Element Arrives, Check Heap Sizes First
-
If both heaps are the same size, push new element into
min_heap. -
If
min_heaphas one more element thanmax_heap, push intomax_heap.
condition I is now satisfied.
Next, check whether heaps need rebalancing:
if maximum of max_heap minimum of min_heap,
promote max_heap's maximum to min_heap,
and demote min_heap's minimum to max_heap.
Keep swapping until:
maximum of max_heap minimum of min_heap
At this point, condition II is satisfied 👌👌
Find the Median: Back to Heap Sizes
A. If min_heap has one more element than max_heap,
the median is minimum of min_heap.
B. If both heaps are the same size, the median is
the average of min_heap's minimum and max_heap's maximum.
Python & C++ Defaults Differ ⚠️
Python's heapq defaults to a min heap.
C++'s priority_queue defaults to a max heap.
For the half that isn't default, remember to negate values: add a negative sign.
So that a default min heap can act as a max heap, and vice versa.
- C++
- Python
#include <queue>
using namespace std;
class MedianFinder // LeetCode Q.295.
{
private:
// When nums count is odd, median goes to min heap.
// Min heap needs to negate value to fit default C++ max heap.
priority_queue<int> minHeap, maxHeap;
public:
MedianFinder() {}
void addNum(int num) {
if (maxHeap.size() == minHeap.size())
minHeap.push(-num); // Min heap negates.
else
maxHeap.push(num);
while (!maxHeap.empty() && !minHeap.empty()) {
if (maxHeap.top() <= -minHeap.top()) // Min heap negates.
break; // No need to adjust.
// Max heap top > -min heap top: mismatch so must switch.
minHeap.push(-maxHeap.top()); // Min heap negates.
maxHeap.push(-minHeap.top()); // Min heap negates.
minHeap.pop();
maxHeap.pop();
}
}
double findMedian() {
if (minHeap.size() > maxHeap.size()) // Min heap negates.
return -minHeap.top();
return (maxHeap.top() - minHeap.top()) * 0.5; // Min heap negates.
}
};
import heapq
class MedianFinder: # LeetCode Q.295.
def __init__(self):
# When nums count is odd, median goes to min heap.
# Max heap needs to negate value to fit default Python min heap.
self.min_heap: list[int] = []
self.max_heap: list[int] = []
def add_num(self, num: int) -> None:
if len(self.max_heap) == len(self.min_heap):
heapq.heappush(self.min_heap, num)
else:
heapq.heappush(self.max_heap, -num) # Max heap negates.
while self.max_heap and self.min_heap:
if -self.max_heap[0] <= self.min_heap[0]: break # No need to adjust.
# -Max heap top > min heap top: mismatch so must switch.
former_max_heap_top = -heapq.heappop(self.max_heap) # Max heap negates.
heapq.heappush(self.min_heap, former_max_heap_top)
former_min_heap_top = heapq.heappop(self.min_heap)
heapq.heappush(self.max_heap, -former_min_heap_top) # Max heap negates.
def find_median(self) -> float:
if len(self.min_heap) > len(self.max_heap):
return self.min_heap[0]
return (self.min_heap[0] - self.max_heap[0]) / 2 # Max heap negates.
LeetCode's addNum and findMedian on the problem page
correspond to my add_num and find_median.
I use 🐫 camelCase in C++ and 🐍 snake_case in Python. Just a personal habit.
Adding a new element takes time to maintain heaps.
Finding median takes . Space is naturally throughout.
I highly recommend Professor Tim Roughgarden's algorithms course.
Beyond median maintenance, it covers many fascinating heap applications and fundamentals of how heaps work.