Skip to content

Research

Post

Building a Named Entity Recognition Model for the Legal Domain

We defined NER in the legal domain and presented our approach towards generating ground truth data. In what follows, we go over the state-of-the-art in the NER domain and elaborate on the experiments we ran and the lessons we learned.

Read More
Post

Named Entity Recognition in the Legal Domain

Named entity recognition is a difficult challenge to solve, particularly in the legal domain. Extracting ground truth labels from long, hierarchical documents is often slow and prone to error. RelationalAI proposes a new scalable algorithm based on the principles of data-centric AI, designed to meet this challenge and generate high-quality annotations with minimal supervision.

Read More
Post

Machine Learning through Database Glasses, NeurIPS 2021

This talk explores several techniques to improve the runtime performance of machine learning by taking advantage of the underlying structure of relational data. While most data scientists use relational data in their work, the data science tooling that works with relational data is quite lacking today. Let’s explore these new techniques and see how we can drastically improve machine learning through a database-oriented lens.

Read More
Post

AI workloads inside databases, NeurIPS 2021

This incredible panel of experts gathered to discuss the current state of AI and machine learning workloads inside databases. The panel discussed new techniques, technologies, and recent papers that progress our understanding of what is possible. Q&A among the panel and from the audience concludes this deep and wide ranging conversation.

Read More
Post

Deep Learning with Relations, NeurIPS 2021

Molham shares some history of relational databases, trends in modern cloud-native database systems, and the innovations pioneered at RelationalAI to bring deep learning with relations from idea to reality.

Read More
Post

Your Wit is my Command

Please join us for this fun and exciting talk by Tony Veale. As an associate professor in the School of Computer Science at University College Dublin (UCD), Ireland, he has worked in AI research for three decades, in academia and in industry, with a special emphasis on humor and linguistic creativity.

Read More
Post

Decision Problems in Information Theory

Constraints on entropies are considered to be the laws of information theory. Even though the pursuit of their discovery has been a central theme of research in information theory, the algorithmic aspects of constraints on entropies remain largely unexplored. Here, we initiate an investigation of decision problems about constraints on entropies by placing several different such problems into levels of the arithmetical hierarchy.

Read More
Post

Maintaining Triangle Queries under Updates

We consider the problem of incrementally maintaining the triangle queries with arbitrary free variables under single-tuple updates to the input relations. We introduce an approach called IVM that exhibits a trade-off between the update time, the space, and the delay for the enumeration of the query result, such that the update time ranges from the square root to linear in the database size while the delay ranges from constant to linear time. IVM achieves Pareto worst-case optimality in the update-delay space conditioned on the Online Matrix-Vector Multiplication conjecture.

Read More
Post

A Principled Approach to Selective Context Sensitivity for Pointer Analysis - TOPLAS

In this work, we present a more principled approach for identifying precision-critical methods, based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis.

Read More
Post

Bag Query Containment and Information Theory

The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In this paper we unveil tight connections between information theory and the conjunctive query containment under bag semantics.

Read More
Post

Computer Vision: Deep Dive into Object Segmentation Approaches

Join optimization has been dominated by Selinger-style, pairwise optimizers for decades. But, Selinger-style algorithms are asymptotically suboptimal for applications in graphic analytics. This suboptimality is one of the reasons that many have advocated supplementing relational engines with specialized graph processing engines.

Read More
Post

Functional Aggregate Queries with Additive Inequalities

Motivated by fundamental applications in databases and relational machine learning, we formulate and study the problem of answering functional aggregate queries (FAQ) in which some of the input factors are defined by a collection of additive inequalities between variables.

Read More
Post

Human in the Loop Enrichment of Product Graphs with Probabilistic Soft Logic

Product graphs have emerged as a powerful tool for online retailers to enhance product semantic search, catalog navigation, and recommendations. Their versatility stems from the fact that they can uniformly store and represent different relationships between products, their attributes, concepts or abstractions etc, in an actionable form.

Read More
Post

Learning Models over Relational Data using Sparse Tensors and Functional Dependencies

Integrated solutions for analytics over relational databases are of great practical importance as they avoid the costly repeated loop data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and train the desired model using this tool.

Read More
Post

A Layered Aggregate Engine for Analytics Workloads

Recommender systems are an integral part of eCommerce services, helping to optimize revenue and user satisfaction. Bundle recommendation has recently gained attention by the research community since behavioral data supports that users often buy more than one product in a single transaction. In most cases, bundle recommendations are of the form “users who bought product A also bought products B, C, and D”. Although such recommendations can be useful, there is no guarantee that products A,B,C, and D may actually be related to each other. In this paper, we address the problem of collection recommendation, i.e., recommending a collection of products that share a common theme and can potentially be purchased together in a single transaction.

Read More
Post

Strictly Declarative Specification of Sophisticated Points-to Analyses

We present the DOOP framework for points-to analysis of Java programs. DOOP builds on the idea of specifying pointer analysis algorithms declaratively, using Datalog: a logic-based language for defining (recursive) relations. We carry the declarative approach further than past work by describing the full end-to-end analysis in Datalog and optimizing aggressively using a novel technique specifically targeting highly recursive Datalog programs.

Read More
Post

Counting Triangles under Updates in Worst-Case Optimal Time

We consider the problem of incrementally maintaining the triangle count query under single-tuple updates to the input relations. We introduce an approach that exhibits a space-time tradeoff such that the space-time product is quadratic in the size of the input database and the update time can be as low as the square root of this size.

Read More
Post

From the Lab to Production: A Case Study of Session-Based Recommendations in the Home-Improvement Domain

E-commerce applications rely heavily on session-based recommendation algorithms to improve the shopping experience of their customers. Recent progress in session-based recommendation algorithms shows great promise. However, translating that promise to real-world outcomes is a challenging task for several reasons, but mostly due to the large number and varying characteristics of the available models. In this paper, we discuss the approach and lessons learned from the process of identifying and deploying a successful session-based recommendation algorithm for a leading e-commerce application in the home-improvement domain. To this end, we initially evaluate fourteen session-based recommendation algorithms in an offline setting using eight different popular evaluation metrics on three datasets.

Read More
Post

Next-Paradigm Programming Languages: What Will They Look Like and What Changes Will They Bring?

What will be the common principles behind next-paradigm, high-productivity programming languages, and how will they change everyday program development? I would like to focus on a question with an answer that can be, surprisingly, clearer: what will be the common principles behind next-paradigm, high-productivity programming languages, and how will they change everyday program development?

Read More
Post

Optimizing Training Data for Image Classifiers

In this paper, we propose a robust method for outlier removal to improve the performance for image classification. Increasing the size of training data does not necessarily raise prediction accuracy, due to instances that may be poor representatives of their respective classes.

Read More
Post

Product Collection Recommendation in Online Retail

Recommender systems are an integral part of eCommerce services, helping to optimize revenue and user satisfaction. Bundle recommendation has recently gained attention by the research community since behavioral data supports that users often buy more than one product in a single transaction. In most cases, bundle recommendations are of the form “users who bought product A also bought products B, C, and D”. Although such recommendations can be useful, there is no guarantee that products A,B,C, and D may actually be related to each other. In this paper, we address the problem of collection recommendation, i.e., recommending a collection of products that share a common theme and can potentially be purchased together in a single transaction.

Read More
Post

Algebraic Modeling in Datalog

Datalog is a deductive language tailored for easy database access. We introduce an algebraic modeling language in Datalog for mixed-integer linear optimization models.

Read More
Post

Rk-means: Fast Clustering for Relational Data

This RelationalAI Research paper introduces Rk-means, or relationalk-means algorithm, for clustering relational data tuples without having to access the full data matrix.

Read More
Post

Defensive Points-To Analysis: Effective Soundness via Laziness

In this work, we present a more principled approach for identifying precision-critical methods, based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis.

Read More
Post

Worst-Case Optimal Join Algorithms: Techniques, Results and Open Problems

Worst-case optimal join algorithms are the class of join algorithms whose runtime match the worst-case output size of a given join query. While the first provably worst-case optimal join algorithm was discovered relatively recently, the techniques and results surrounding these algorithms grow out of decades of research from a wide range of areas, intimately connecting graph theory, algorithms, information theory, constraint satisfaction, database theory, and geometric inequalities.

Read More
Post

What Do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Have to Do with One Another?

Recent works on bounding the output size of a conjunctive query with functional dependencies and degree bounds have shown a deep connection between fundamental questions in information theory and database theory. This paper connects semantic query optimization, physical query optimization & cost estimation, to information theory with provable bounds.

Read More
Post

Comprehensive Survey of Recursive Query Processing and Optimization Techniques using Datalog

In recent years, we have witnessed a revival of the use of recursive queries in a variety of emerging application domains such as data integration and exchange, information extraction, networking, and program analysis. A popular language used for expressing these queries is Datalog.

Read More
Post

Functional Aggregate Query (FAQ): Questions Asked Frequently

We define and study the Functional Aggregate Query (FAQ) problem, which encompasses many frequently asked questions in constraint satisfaction, databases, matrix operations, probabilistic graphical models and logic. This is our main conceptual contribution.

Read More
Post

Design and Implementation of the LogicBlox System

The LogicBlox system aims to reduce the complexity of software development for modern applications which enhance and automate decision-making and enable their users to evolve their capabilities via a “self-service” model.

Read More
Post

Join Processing for Graph Patterns: An Old Dog with New Tricks

Join optimization has been dominated by Selinger-style, pairwise optimizers for decades. But, Selinger-style algorithms are asymptotically suboptimal for applications in graphic analytics. This suboptimality is one of the reasons that many have advocated supplementing relational engines with specialized graph processing engines.

Read More
Post

Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm

In 2012, Ngo, Porat, R«e and Rudra (henceforth NPRR) devised a join algorithm with worst-case running time proportional to the AGM bound [8]. Our commercial database system LogicBlox employs a novel join algorithm, leapfrog triejoin, which compared conspicuously well to the NPRR algorithm in preliminary benchmarks.

Read More
Post

Hybrid Context-Sensitivity for Points-To Analysis

Context sensitive points-to analysis is valuable for achieving high precision with good performance.The standard flavors of context sensitivity are call site-sensitivity (kCFA) and object-sensitivity. Combining both flavors of context-sensitivity increases precision but at an infeasibly high cost.

Read More
Post

Worst-case Optimal Join Algorithms

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a natural join query in terms of the sizes of the individual relations in the body of the query

Read More
Post

Pick Your Contexts Well: Understanding Object-Sensitivity

Object-sensitivity has emerged as an excellent context abstraction for points-to analysis in object-oriented languages. Despite its practical success, however, object-sensitivity is poorly understood.

Read More

Get Started!

Start your journey with RelationalAI today! Sign up to receive our newsletter, invitations to exclusive events, and customer case studies.

The information you provide will be used in accordance with the terms of our Privacy Policy. By submitting this form, you consent to allow RelationalAI to store and process the personal information submitted above to provide you the content requested.