
Refine semantic assertions through contextual matching
refine.Rdrefine() incrementally stabilizes semantic assertions through
deterministic contextual matching rules while preserving row
cardinality and the original observational universe.
Arguments
- x
A data frame or tibble.
- target
Name of the target column to refine.
- rules
A rule table or compiled rulebook.
- by
Optional grouping variables used during refinement.
- assertion
Optional assertion text recorded in provenance.
- comment
Optional comment attached to the refinement step.
- match
Matching strategy. Defaults to
"first".
Details
The function is designed for lightweight semantic refinement workflows where semantic interpretations mature gradually through ordinary tidyverse operations.
Matching observations are identified through configurable matching semantics applied to one or more observational variables.
Supported matching semantics include:
"exact"relational equality;"starts_with"hierarchical prefix matching;"ends_with"suffix matching;"contains"substring detection.
Matching positions in the target vector are replaced by refined semantic assertions.
Unmatched values remain unchanged.
refine() intentionally never:
removes rows;
reshapes tables;
modifies unrelated observations.
This makes refinement stages auditable, reversible, and compatible with iterative semantic stabilization workflows.
refine() operates on semantic operationalisations produced
through workflows such as:
previous refinement stages.
Rather than enforcing formally complete ontology semantics, the function provides a lightweight operational mechanism for progressively stabilizing semantic interpretations inside ordinary analytical workflows.
Multiple refinement stages may later mature into:
controlled vocabularies;
labelled::labelled()vectors;dataset::defined()vectors;semantically enriched datasets
compatible with iterative semantic workflows.
refine() operates on semantic operationalisations produced
through workflows such as:
earlier refinement stages.
The function does not attempt to construct formally complete semantic graphs or enforce ontology-level consistency.
Instead, it provides a lightweight operational mechanism for progressively stabilizing semantic interpretations inside ordinary tidyverse workflows.
This approach is particularly useful when working with:
partially harmonised datasets;
inconsistent coding systems;
ambiguous metadata;
hierarchical filesystem structures;
exploratory semantic reconstruction workflows.
Multiple refinement stages may later mature into:
controlled vocabularies;
formally defined semantic vectors;
semantically enriched datasets;
or graph-based semantic representations.
Examples
files <- tibble::tibble(
filename = c(
"filmA.png",
"filmB.png",
"film.xlsx",
"fill.png"
),
extension = c(
"png",
"png",
"xlsx",
"png"
)
)
out <- refine(
x = files,
target =
rep(
"unresolved",
nrow(files)
),
rules =
tibble::tibble(
filename = "film",
extension = "png"
),
by = c(
"filename",
"extension"
),
match = c(
"starts_with",
"exact"
),
assertion =
"film_visualisation"
)
out
#> [1] "film_visualisation" "film_visualisation" "unresolved"
#> [4] "unresolved"