We defined NER in the legal domain and presented our approach towards generating ground truth data. In what follows, we go over the state-of-the-art in the NER domain and elaborate on the experiments we ran and the lessons we learned.
Named entity recognition is a difficult challenge to solve, particularly in the legal domain. Extracting ground truth labels from long, hierarchical documents is often slow and prone to error. RelationalAI proposes a new scalable algorithm based on the principles of data-centric AI, designed to meet this challenge and generate high-quality annotations with minimal supervision.
This talk explores several techniques to improve the runtime performance of machine learning by taking advantage of the underlying structure of relational data. While most data scientists use relational data in their work, the data science tooling that works with relational data is quite lacking today. Let’s explore these new techniques and see how we can drastically improve machine learning through a database-oriented lens.
This incredible panel of experts gathered to discuss the current state of AI and machine learning workloads inside databases. The panel discussed new techniques, technologies, and recent papers that progress our understanding of what is possible. Q&A among the panel and from the audience concludes this deep and wide ranging conversation.
Molham shares some history of relational databases, trends in modern cloud-native database systems, and the innovations pioneered at RelationalAI to bring deep learning with relations from idea to reality.
Please join us for this fun and exciting talk by Tony Veale. As an associate professor in the School of Computer Science at University College Dublin (UCD), Ireland, he has worked in AI research for three decades, in academia and in industry, with a special emphasis on humor and linguistic creativity.
Constraints on entropies are considered to be the laws of information theory. Even though the pursuit of their discovery has been a central theme of research in information theory, the algorithmic aspects of constraints on entropies remain largely unexplored. Here, we initiate an investigation of decision problems about constraints on entropies by placing several different such problems into levels of the arithmetical hierarchy.
We consider the problem of incrementally maintaining the triangle queries with arbitrary free variables under single-tuple updates to the input relations. We introduce an approach called IVM that exhibits a trade-off between the update time, the space, and the delay for the enumeration of the query result, such that the update time ranges from the square root to linear in the database size while the delay ranges from constant to linear time. IVM achieves Pareto worst-case optimality in the update-delay space conditioned on the Online Matrix-Vector Multiplication conjecture.
In this work, we present a more principled approach for identifying precision-critical methods, based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis.
The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In this paper we unveil tight connections between information theory and the conjunctive query containment under bag semantics.
Join optimization has been dominated by Selinger-style, pairwise optimizers for decades. But, Selinger-style algorithms are asymptotically suboptimal for applications in graphic analytics. This suboptimality is one of the reasons that many have advocated supplementing relational engines with specialized graph processing engines.
Motivated by fundamental applications in databases and relational machine learning, we formulate and study the problem of answering functional aggregate queries (FAQ) in which some of the input factors are defined by a collection of additive inequalities between variables.
Product graphs have emerged as a powerful tool for online retailers to enhance product semantic search, catalog navigation, and recommendations. Their versatility stems from the fact that they can uniformly store and represent different relationships between products, their attributes, concepts or abstractions etc, in an actionable form.
Integrated solutions for analytics over relational databases are of great practical importance as they avoid the costly repeated loop data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and train the desired model using this tool.
Recommender systems are an integral part of eCommerce services, helping to optimize revenue and user satisfaction. Bundle recommendation has recently gained attention by the research community since behavioral data supports that users often buy more than one product in a single transaction. In most cases, bundle recommendations are of the form “users who bought product A also bought products B, C, and D”. Although such recommendations can be useful, there is no guarantee that products A,B,C, and D may actually be related to each other. In this paper, we address the problem of collection recommendation, i.e., recommending a collection of products that share a common theme and can potentially be purchased together in a single transaction.
We present the DOOP framework for points-to analysis of Java programs. DOOP builds on the idea of specifying pointer analysis algorithms declaratively, using Datalog: a logic-based language for defining (recursive) relations. We carry the declarative approach further than past work by describing the full end-to-end analysis in Datalog and optimizing aggressively using a novel technique specifically targeting highly recursive Datalog programs.
We consider the problem of incrementally maintaining the triangle count query under single-tuple updates to the input relations. We introduce an approach that exhibits a space-time tradeoff such that the space-time product is quadratic in the size of the input database and the update time can be as low as the square root of this size.
E-commerce applications rely heavily on session-based recommendation algorithms to improve the shopping experience of their customers. Recent progress in session-based recommendation algorithms shows great promise. However, translating that promise to real-world outcomes is a challenging task for several reasons, but mostly due to the large number and varying characteristics of the available models. In this paper, we discuss the approach and lessons learned from the process of identifying and deploying a successful session-based recommendation algorithm for a leading e-commerce application in the home-improvement domain. To this end, we initially evaluate fourteen session-based recommendation algorithms in an offline setting using eight different popular evaluation metrics on three datasets.
What will be the common principles behind next-paradigm, high-productivity programming languages, and how will they change everyday program development? I would like to focus on a question with an answer that can be, surprisingly, clearer: what will be the common principles behind next-paradigm, high-productivity programming languages, and how will they change everyday program development?
In this paper, we propose a robust method for outlier removal to improve the performance for image classification. Increasing the size of training data does not necessarily raise prediction accuracy, due to instances that may be poor representatives of their respective classes.
Recommender systems are an integral part of eCommerce services, helping to optimize revenue and user satisfaction. Bundle recommendation has recently gained attention by the research community since behavioral data supports that users often buy more than one product in a single transaction. In most cases, bundle recommendations are of the form “users who bought product A also bought products B, C, and D”. Although such recommendations can be useful, there is no guarantee that products A,B,C, and D may actually be related to each other. In this paper, we address the problem of collection recommendation, i.e., recommending a collection of products that share a common theme and can potentially be purchased together in a single transaction.
Datalog is a deductive language tailored for easy database access. We introduce an algebraic modeling language in Datalog for mixed-integer linear optimization models.
This RelationalAI Research paper introduces Rk-means, or relationalk-means algorithm, for clustering relational data tuples without having to access the full data matrix.
In this work, we present a more principled approach for identifying precision-critical methods, based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis.
Worst-case optimal join algorithms are the class of join algorithms whose runtime match the worst-case output size of a given join query. While the first provably worst-case optimal join algorithm was discovered relatively recently, the techniques and results surrounding these algorithms grow out of decades of research from a wide range of areas, intimately connecting graph theory, algorithms, information theory, constraint satisfaction, database theory, and geometric inequalities.
Recent works on bounding the output size of a conjunctive query with functional dependencies and degree bounds have shown a deep connection between fundamental questions in information theory and database theory. This paper connects semantic query optimization, physical query optimization & cost estimation, to information theory with provable bounds.
In recent years, we have witnessed a revival of the use of recursive queries in a variety of emerging application domains such as data integration and exchange, information extraction, networking, and program analysis. A popular language used for expressing these queries is Datalog.
We define and study the Functional Aggregate Query (FAQ) problem, which encompasses many frequently asked questions in constraint satisfaction, databases, matrix operations, probabilistic graphical models and logic. This is our main conceptual contribution.
The LogicBlox system aims to reduce the complexity of software development for modern applications which enhance and automate decision-making and enable their users to evolve their capabilities via a “self-service” model.
Join optimization has been dominated by Selinger-style, pairwise optimizers for decades. But, Selinger-style algorithms are asymptotically suboptimal for applications in graphic analytics. This suboptimality is one of the reasons that many have advocated supplementing relational engines with specialized graph processing engines.
In 2012, Ngo, Porat, R«e and Rudra (henceforth NPRR) devised a join algorithm with worst-case running time proportional to the AGM bound [8]. Our commercial database system LogicBlox employs a novel join algorithm, leapfrog triejoin, which compared conspicuously well to the NPRR algorithm in preliminary benchmarks.
Context sensitive points-to analysis is valuable for achieving high precision with good performance.The standard flavors of context sensitivity are call site-sensitivity (kCFA) and object-sensitivity. Combining both flavors of context-sensitivity increases precision but at an infeasibly high cost.
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a natural join query in terms of the sizes of the individual relations in the body of the query
Object-sensitivity has emerged as an excellent context abstraction for points-to analysis in object-oriented languages. Despite its practical success, however, object-sensitivity is poorly understood.
Start your journey with RelationalAI today! Sign up to receive our newsletter, invitations to exclusive events, and customer case studies.
The information you provide will be used in accordance with the terms of our Privacy Policy. By submitting this form, you consent to allow RelationalAI to store and process the personal information submitted above to provide you the content requested.