Problem 14.1

Point of maximum overlap

Suppose that we wish to keep track of a point of maximum overlap in a set of intervals – a point with the largest number of intervals in the set that overlaps it.

Show that there will always be a point of maximum overlap that is an endpoint of one of the segments.

Design a data structure that efficiently supports the operations INTERVAL-INSERT, INTERVAL-DELETE, and FIND-POM, which returns a point of maximum overlap. (Hint: Keep a red-black tree of all the endpoints. Associate a value of +1 with each left endpoint, and associate a value of -1 with each right endpoint. Augment each node of the tree with some extra information to maintain the point of maximum overlap.)

Maximum overlap at an endpoint

This is a bit obvious, so as usual, proving it is tricky.

Let $x$ be a maximum overlap point and $S$ be set of the intervals that overlap around $x$. Let $I_S$ the intersection of all intervals in $S$, that is:

$$ I_S = \bigcup_{X \in S} X $$

If we let $I_S = (a, b)$, that is, for $a$ and $b$ to be the endpoints of the intersection, they are also points of maximum overlap. Furthermore, they are the endpoint of one of the intervals in $S$ (otherwise $I_S$ would not end there).

Data structure

As hinted, the data structure is a red-black tree, where each key is associated with a weight (-1 or 1) depending on whether it's the start or end of an interval. If we represent each set of intervals as a list of -1s and 1s in the order of the endpoints, then the largest number of overlapping interval is the maximum of the sums of each sublist.

If there are duplicate endpoints, we need to order the 1s before the -1s.

A list would not get good performance, however, so we model it as a tree:

The tree is an ordering of 1s and -1s of all the endpoints.
We're looking for a maximum sum of sequential endpoints, and an endpoint that generates it.

Note that each node represent a sublist of the endpoints, that is, there are no elements between the minimum and the maximum of the subtree in the full list of endpoints that are not present in the subtree.

We can calculate this efficiently by storing three properties on each node:

The sum of all the weights in the subtree rooted at a node.
The maximum weight in attainable by a prefix of the tree
The element that creates this maximum weight

The first is obvious, and the third is calculated by the second. So the question is how we keep calculate the maximum overlap at a subtree, given that we have it for its children.

There are essentially three cases:

The maximum overlap is achieved in a suffix of the left subtree
The maximum overlap is achieved by entire left subtree plus the endpoint of the node
The maximum overlap is achieved by entire left subtree plus the endpoint of the node plus a prefix in the right subtree

They result in three different things we need to check, which is implemented in the code below

Python code

import sys, os
sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'misc'))

from augmentable_tree import AugmentableTree

class Endpoint:
    def __init__(self, value, weight): self.value, self.weight = value, weight
    def __lt__(self, other): return (self.value, -self.weight) < (other.value, -other.weight)
    def __eq__(self, other): return (self.value, self.weight) == (other.value, other.weight)
    def isLow(self): return self.weight == 1


def weight(node):
    return node.weight if node else 0


def optimal(node):
    return node.optimal if node else (0, None)


class OverlapTree(AugmentableTree):
    def __init__(self, intervals = []):
        super(OverlapTree, self).__init__()

        for interval in intervals:
            self.insert_interval(interval)

    def augment_node(self, node):
        node.optimal = (1, node.key.value) if node.key.isLow() else (0, None)
        node.weight = node.key.weight
        node.sexp = lambda: self.print_node(node)

    def recalculate_node(self, node):
        node.weight = node.key.weight + weight(node.left) + weight(node.right)

        right_optimal = optimal(node.right)

        candidates = [
            optimal(node.left),
            (weight(node.left) + node.key.weight, node.key.value),
            (weight(node.left) + node.key.weight + right_optimal[0], right_optimal[1] or node.key.value),
        ]

        node.optimal = max(candidates, key=lambda t: t[0])

    def max_overlap(self):
        return optimal(self.root)[1]

    def insert_interval(self, interval):
        self.insert(Endpoint(interval.low, 1))
        self.insert(Endpoint(interval.high, -1))

    def delete_interval(self, interval):
        self.delete(Endpoint(interval.low, 1))
        self.delete(Endpoint(interval.high, -1))