Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm
In this paper we establish that leapfrog triejoin is also worst-case optimal, up to a log factor, in the sense of NPRR
Author: Todd L. Veldhuizen. 2014.
In Proceedings of the 17th International Conference on Database Theory (ICDT ‘14).
Recent years have seen exciting developments in join algorithms.In 2008, Atserias, Grohe and Marx (henceforth AGM) proved a tight bound on the maximum result size of a full conjunctive query, given constraints on the input relation sizes. In 2012, Ngo, Porat, R«e and Rudra (henceforth NPRR) devised a join algorithm with worst-case running time proportional to the AGM bound [8]. Our commercial database system LogicBlox employs a novel join algorithm, leapfrog triejoin, which compared conspicuously well to the NPRR algorithm in preliminary benchmarks. This spurred us to analyze the complexity of leapfrog triejoin. In this paper we establish that leapfrog triejoin is also worst-case optimal, up to a log factor, in the sense of NPRR. We improve on the results of NPRR by proving that leapfrog triejoin achieves worst-case optimality for finer-grained classes of database instances, such as those defined by constraints on projection cardinalities. We show that NPRR is not worst-case optimal for such classes, giving a counterexample where leapfrog triejoin runs inO(nlogn)time and NPRR runs in Θ(n1.375)time. On a practical note, leapfrog triejoin can be implemented using conventional data structures such as B-trees, and extends naturally to ∃1 queries. We believe our algorithm offers a useful addition to the existing toolbox of join algorithms, being easy to absorb, simple to implement, and having a concise optimality proof.
Read the PDF: Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm (opens in a new tab)
Related Posts
Hybrid Context-Sensitivity for Points-To Analysis
Context sensitive points-to analysis is valuable for achieving high precision with good performance.The standard flavors of context sensitivity are call site-sensitivity (kCFA) and object-sensitivity. Combining both flavors of context-sensitivity increases precision but at an infeasibly high cost.
Worst-case Optimal Join Algorithms
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a natural join query in terms of the sizes of the individual relations in the body of the query