Find Median from Data Stream

Updated on 27 May, 2025
Find Median from Data Stream header image

Problem Statement

The task is to create a class MedianFinder that dynamically calculates the median of a continuously updating list of integers. The median is defined as the middle value in a list when sorted. If the list's length is even, the median is the average of the two middle numbers.

  • When the integer list like [2,3,4] is provided, the median is the single middle value 3.
  • For an even number of elements, such as in [2,3], the median becomes the average of the middle two numbers, resulting in a median of 2.5.

The MedianFinder class should be able to:

  • Initialize with the MedianFinder() constructor.
  • Add a new number with addNum(int num).
  • Provide the current median of all added numbers with findMedian(), being within 10-5 tolerance of the actual median.

Examples

Example 1

Input

["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]

Output

[null, null, null, 1.5, null, 2.0]

Explanation

MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.addNum(2); // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3); // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0

Constraints

  • -105 <= num <= 105
  • There will be at least one element in the data structure before calling findMedian.
  • At most 5 * 104 calls will be made to addNum and findMedian.

Approach and Intuition

The main challenge in implementing the MedianFinder class lies in efficiently managing an unsorted stream of data and finding the median in constant or logarithmic time as data accumulates. The class should perform the following operations:

  1. The MedianFinder() constructor initializes the data structure.

  2. addNum(int num):

    • This method handles the addition of each new number num into the structure.
    • To efficiently retrieve the median, we can use two heaps (a max-heap to store the smaller half of the numbers and a min-heap for the larger half). This allows us to balance the heaps whenever a new number is added such that:
      • The max-heap contains the smaller half with the largest values accessible in constant time.
      • The min-heap contains the larger half, again allowing constant time access to the smallest values of this half.
  3. findMedian():

    • If the total count of numbers is odd, the median is located at the top of the larger heap (depending on which heap has more elements).
    • If the count is even, the median is the average of the roots of both heaps.
    • Given the structure of the heaps, access to these roots is constant time.

Example

Let's consider ["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"] with inputs [[], [1], [2], [], [3], []]. The operations would proceed as follows:

  • A new MedianFinder object is instantiated.
  • 1 is added to the data structure. Since it's the first element, it is alone in the max-heap.
  • 2 is added next. Given our two-heaps approach, 1 stays in the max-heap, and 2 moves to the min-heap.
  • Calling findMedian() at this stage computes the median of [1, 2] which is 1.5.
  • After adding 3, the heaps adjust to keep the balance: max-heap [1], min-heap [2, 3].
  • The next findMedian() call calculates the median from [1, 2, 3], which is the root of the min-heap, resulting in 2.0.

Notes on Constraints and Edge Cases

The constraints ensure that:

  • The stream will always have at least one element when findMedian is called, preventing queries on an empty dataset.
  • The range of num is capped at ±105, and the number of operations is bounded by 5 * 104. The heap-based approach is efficient enough to handle this boundary within acceptable computational limits.

Solutions

  • C++
cpp
class MedianCalculator {
    multiset<int> numbers;
    multiset<int>::iterator medianIterator;

public:
    MedianCalculator()
        : medianIterator(numbers.end())
    {
    }

    void insertNum(int value)
    {
        const int count = numbers.size();
        numbers.insert(value);

        if (!count)                                   // inserting the first element
            medianIterator = numbers.begin();
        else if (value < *medianIterator)             // insert value affects the lower half
            medianIterator = (count % 2 ? medianIterator : prev(medianIterator));
        else                                          // insert value affects the upper half
            medianIterator = (count % 2 ? next(medianIterator) : medianIterator);
    }

    double computeMedian()
    {
        const int size = numbers.size();
        return ((double) *medianIterator + *next(medianIterator, size % 2 - 1)) * 0.5;
    }
};

The provided C++ code defines a class MedianCalculator that efficiently calculates the median of a dynamically changing data stream using a multiset to store the numbers and iterators for median management. Here's a concise explanation of how each part functions and the significance of the operations within:

  • Initialization:

    • The constructor initializes medianIterator to the end of an empty multiset, preparing it for the incoming data.
  • Inserting a Number:

    • The insertNum method adds a new number to the multiset and adjusts the medianIterator accordingly.
    • When inserting into an empty data structure, the iterator is set to the beginning.
    • When inserting a number that affects the lower half of the multiset (i.e., the number is less than the median), the iterator may either stay in place or move back one position, depending on the set's current size.
    • For numbers affecting the upper half (i.e., equal to or greater than the median), the iterator may move forward one position or stay put, dependent on the odd or even count of the set.
  • Computing the Median:

    • The computeMedian method determines the median by comparing the size of the multiset.
    • If the set size is odd, it directly returns the value at medianIterator.
    • For even-sized sets, it calculates the average of the elements pointed to by medianIterator and the following element.

This setup is suitable for scenarios where a considerable amount of data is continuously added and median needs constant recalculating, providing an efficient solution by maintaining sorted elements and allowing quick median computation.

Comments

No comments yet.