01 January 2012
less than a minute read
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in terms of worst-case data complexity.
Authors: Hung Q. Ngo, Ely Porat, Christopher Ré, Atri Rudra. 2012.
In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems (PODS ‘12). (Best Paper Award).
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a natural join query in terms of the sizes of the individual relations in the body of the query. These bounds, however, are not constructive: they rely on Shearer’s entropy inequality, which is information-theoretic. Thus, the previous results leave open the question of whether there exist algorithms whose runtimes achieve these optimal bounds. An answer to this question may be interesting to database practice, as we show in this article that any project-join style plans, such as ones typically employed in a relational database management system, are asymptotically slower than the optimal for some queries. We present an algorithm whose runtime is worst-case optimal for all natural join queries. Our result may be of independent interest, as our algorithm also yields a constructive proof of the general fractional cover bound by Atserias, Grohe, and Marx without using Shearer’s inequality. This bound implies two famous inequalities in geometry: the Loomis-Whitney inequality and its generalization, the Bollobás-Thomason inequality. Hence, our results algorithmically prove these inequalities as well. Finally, we discuss how our algorithm can be used to evaluate full conjunctive queries optimally, to compute a relaxed notion of joins and to optimally (in the worst-case) enumerate all induced copies of a fixed subgraph inside of a given large graph.
Molham shares some history of relational databases, trends in modern cloud-native database systems, and the innovations pioneered at RelationalAI to bring deep learning with relations from idea to reality.Read More
This incredible panel of experts gathered to discuss the current state of AI and machine learning workloads inside databases. The panel discussed new techniques, technologies, and recent papers that progress our understanding of what is possible. Q&A among the panel and from the audience concludes this deep and wide ranging conversation.Read More
This talk explores several techniques to improve the runtime performance of machine learning by taking advantage of the underlying structure of relational data. While most data scientists use relational data in their work, the data science tooling that works with relational data is quite lacking today. Let’s explore these new techniques and see how we can drastically improve machine learning through a database-oriented lens.Read More