Regular Expression Matching and Capture Groups in Rel
The Rel library has included basic regular expression (opens in a new tab) support for a while. For example, regex_match (opens in a new tab) tests whether a string matches a regular expression.
regex_match("^.*@.*$", "some@example.com")
The relation string_replace
also supports regular expressions, for example string_trim
is defined using a regular expression:
def string_trim[s] = string_replace[
s, regex_compile["^\\s+|\\s+$"], ""]
Until now, it was not yet possible to extract matching substrings using regular expressions. We are excited to announce that we have now added support for this.
The relation regex_match_all
finds all substrings in a string that match the regular expression. The relation includes the matched substring as well as corresponding offsets.
// read query
def output = regex_match_all["(cat|dog)s?", "cats are not dogs"]
Relation:
We also introduce the capture_group_by_index
relation to capture a substring that matches groups in a regular expression. This relation searches for matches in an input string starting from a given offset
.
Each group in the regular expression is automatically given a unique number starting with 1.
// read query
def email = "john.doe@example.com"
def pattern = "^(.*)@(.*).com$"
def output = email, capture_group_by_index[pattern, email, 1]
Relation:
Along with numerical index, Rel supports regular expressions with named capture groups. The capture_group_by_name
relation includes the captured substring for the corresponding group name.
// read query
def my_string = "Meeting is at 11:45 AM"
def pattern = "(?<hour>\\d+):(?<minute>\\d+)"
def output = capture_group_by_name[pattern, my_string, 1]
Relation:
The regular expression capabilities are implemented using the foreign function interfaces, but these relations are designed to be used as any relation. For example, when a specific capture group is needed, it can be specified upfront, as illustrated in this example:
// read query
def my_group = capture_group_by_name[
"^.*@(?<domain>.*)\\.com$", "foo@example.com", 1]
def output = my_group["domain"]
Relation:
With the new regular expression features we expect to cover more of the common data engineering use-cases. We’re excited to learn about how you are using Rel — please let us know about any future features you’d like to see.
Related Posts
Metadata Management Series: Weaving Asserted and Discovered Metadata into Your Data Fabric
Showing how knowledge graphs support the construction of scalable models that mix discoverable with explicitly asserted metadata to afford reasoning and policy enforcement. In this first article, we show how explicitly asserted metadata in a knowledge graph enables automated reasoning.
Semantic Optimizer
Semantic Optimization makes your complex data workloads more efficient, which in turn improves overall system performance and scalability.