Skip to main content

295. Median Finder

Find Median from Data Stream

A key application of heaps: tracking not just real-time minimum or maximum, but also real-time median in a data stream.

Definition of Data Stream Median

  • Length nn is odd: after sorting, the middle element. Like, median of [1,2,3][1, 2, 3] is 22
  • Length nn is even: after sorting, the average of two middle elements. Like, median of [1,2,3,4][1, 2, 3, 4] is (2+3)/2=2.5(2 + 3) / 2 = 2.5

With definition in place, the question becomes:

as new numbers keep arriving, how do we always know the current median?

What Sorting Implies

A sorted sequence can naturally be split into a lower half and an upper half.

"Maximum" of lower half \leq "minimum" of upper half.

So we prepare a min_heap and a max_heap:

min_heap maintains upper half, max_heap maintains lower half.

Two conditions must maintain:

I. Either both heaps have the same size, or min_heap has exactly one more element than max_heap

When the data stream has an odd number of elements, the median lives in min_heap,

which is also when min_heap has one more element.

This is simply my preference to place median in upper half — lower half works too, just be consistent.

II. Maximum of max_heap \leq minimum of min_heap

If this isn't satisfied, keep swapping between two heaps until it is.

When a New Element Arrives, Check Heap Sizes First

  • If both heaps are the same size, push new element into min_heap.

  • If min_heap has one more element than max_heap, push into max_heap.

condition I is now satisfied.

Next, check whether heaps need rebalancing:

if maximum of max_heap >> minimum of min_heap,

promote max_heap's maximum to min_heap,

and demote min_heap's minimum to max_heap.

Keep swapping until:

maximum of max_heap \leq minimum of min_heap

At this point, condition II is satisfied 👌👌

Find the Median: Back to Heap Sizes

A. If min_heap has one more element than max_heap, the median is minimum of min_heap.

B. If both heaps are the same size, the median is the average of min_heap's minimum and max_heap's maximum.

Python & C++ Defaults Differ ⚠️

Python's heapq defaults to a min heap.

C++'s priority_queue defaults to a max heap.

For the half that isn't default, remember to negate values: add a negative sign.

So that a default min heap can act as a max heap, and vice versa.

import heapq


class MedianFinder: # LeetCode Q.295.
def __init__(self):
# When nums count is odd, median goes to min heap.
# Max heap needs to negate value to fit default Python min heap.
self.min_heap: list[int] = []
self.max_heap: list[int] = []

def add_num(self, num: int) -> None:
if len(self.max_heap) == len(self.min_heap):
heapq.heappush(self.min_heap, num)

else:
heapq.heappush(self.max_heap, -num) # Max heap negates.

while self.max_heap and self.min_heap:
if -self.max_heap[0] <= self.min_heap[0]: break # No need to adjust.

# -Max heap top > min heap top: mismatch so must switch.
former_max_heap_top = -heapq.heappop(self.max_heap) # Max heap negates.
heapq.heappush(self.min_heap, former_max_heap_top)

former_min_heap_top = heapq.heappop(self.min_heap)
heapq.heappush(self.max_heap, -former_min_heap_top) # Max heap negates.

def find_median(self) -> float:
if len(self.min_heap) > len(self.max_heap):
return self.min_heap[0]

return (self.min_heap[0] - self.max_heap[0]) / 2 # Max heap negates.

LeetCode's addNum and findMedian on the problem page correspond to my add_num and find_median.

I use 🐫 camelCase in C++ and 🐍 snake_case in Python. Just a personal habit.

Two Heaps Efficiency Adding a new element takes O(logn)O(\log n) time to maintain heaps.

Finding median takes O(1)O(1). Space is naturally O(n)O(n) throughout.

I highly recommend Professor Tim Roughgarden's algorithms course.

Beyond median maintenance, it covers many fascinating heap applications and fundamentals of how heaps work.