
Create a contextual Record Set projection from observational resources
create_record_set.RdConstructs a lightweight contextual Record Set projection from an observational resource table.
Record Set is a contextual aggregation concept defined by the
International Council on Archives (ICA) Records in Contexts
standard (RiC).
In operational terms, a Record Set may represent:
a project workspace;
a synchronized cloud folder;
a repository inventory;
a digitisation batch;
a web archive collection;
a reconstruction corpus;
or another contextual grouping of related digital resources.
The function is designed as an operational bridge between:
filesystem observations;
web archive inventories;
digitised heritage collections;
repository inventories;
and later semantically enriched Record Set representations.
The returned object is intentionally a plain tibble rather than a
semantically enriched recordset_df.
This allows:
efficient tidyverse workflows;
exploratory analytical pipelines;
lightweight contextual reconstruction;
deferred semantic stabilisation;
provenance-aware iterative enrichment.
More information:
ICA Records in Contexts overview: https://www.ica.org/ica-network/expert-groups/egad/records-in-contexts-ric/
RiC-O ontology repository: https://github.com/ica-egad/ric-o
In RiC-aligned operational terminology:
rows typically represent observed or derived Record Resources, Instantiations, or other operational resource proxies;
the resulting tibble represents a contextual Record Set projection constructed from deterministic operational rules;
the function does not create authoritative archival arrangement, fonds hierarchy, or curatorial description;
Record Set semantics remain analytical and operational unless later stabilised through curatorial or semantic workflows.
Typical use cases include:
grouping filesystem observations into project-level Record Sets;
constructing analytical corpora from repository structures;
creating candidate archival aggregations;
preparing Heritage Digital Twin analytical spaces;
contextualising WARC/WACZ collections;
constructing enrichment workspaces for knowledge-graph workflows.
The function deliberately separates:
operational contextualisation (
create_record_set())
from:
semantic publication and metadata enrichment (
as_recordset_df()).
This mirrors the package philosophy used throughout the observational pipeline:
observe first;
contextualise second;
interpret later.
Usage
create_record_set(
x,
record_set_id,
resource_id,
construction_rule,
locator_path = NULL,
resource_title = NULL,
resource_type = NULL
)Arguments
- x
A
data.frameor tibble containing observational or derived resource rows.- record_set_id
Character scalar or existing column name defining the contextual Record Set membership of each resource.
Typical examples include:
structural filesystem groupings;
repository roots;
WARC collection identifiers;
digitisation batches;
curatorial aggregation identifiers.
- resource_id
Character scalar or existing column name defining the operational identity of each resource within the Record Set.
In many filesystem workflows,
resource_idwill often correspond to what users informally think of as a "file" or "file name".However, the identifier intentionally represents an operational or contextual resource approximation rather than an authoritative or permanent file identity.
This distinction matters because digital resources frequently evolve over time:
filenames and paths may change;
synchronized copies may diverge;
local and cloud versions may coexist;
files may be copied, renamed, or reorganised;
multiple observations may refer to evolving versions of the same underlying resource.
For example:
the same digital resource ("file") may exist in multiple locations;
a synchronized cloud copy may differ from a local working copy;
a renamed file may still represent the continuation of the same evolving digital resource.
Typical examples include:
storage_path_id(storage-scoped filesystem resource approximation);URI identifiers;
WARC record identifiers;
repository-relative identifiers;
IIIF resource identifiers.
- construction_rule
Character description documenting the deterministic operational rule used to construct the contextual Record Set projection.
Examples:
"filtered_project_roots|structural_group""warc_collection|domain_partition""iiif_manifest|folder_batch"
The construction rule is stored as lightweight provenance metadata attached to the resulting tibble.
- locator_path
Optional character scalar or existing column name providing a human-readable operational locator associated with the resource.
Examples include:
filesystem paths;
repository-relative paths;
URIs;
WARC locators;
IIIF resource paths.
- resource_title
Optional character scalar or existing column name containing a human-readable resource title or label.
- resource_type
Optional character scalar or existing column name describing the operational resource type.
Examples:
"file""warc_record""iiif_canvas""rdf_resource""digitised_page"
Value
A tibble representing a contextual operational Record Set projection.
The resulting tibble contains:
record_set_idresource_idoptional contextual resource variables
together with lightweight provenance attributes:
construction_rulecreated_byrecord_set_created_at
Details
The function intentionally performs only lightweight contextual projection and validation.
It does not:
infer authoritative documentary hierarchy;
enforce archival arrangement;
construct RiC-complete semantic graphs;
perform provenance reasoning;
stabilise resource identity across time.
Semantic enrichment and publication-oriented metadata are intended
to be added later via as_recordset_df().
This staged architecture supports:
efficient analytical workflows;
iterative reconstruction;
provenance-aware contextualisation;
future alignment with RiC-O and RiC-CM.
Examples
toy_record_set <- tibble::tibble(
structural_group = c(
"heritage_collection",
"heritage_collection",
"digitisation_batch"
),
storage_path_id = c(
"laptop01::scans/photo_001.tif",
"laptop01::ocr/photo_001.txt",
"archive01::reports/summary.qmd"
),
rel_root_path = c(
"scans/photo_001.tif",
"ocr/photo_001.txt",
"reports/summary.qmd"
)
)
toy_record_set <- create_record_set(
toy_record_set,
record_set_id = "structural_group",
resource_id = "storage_path_id",
locator_path = "rel_root_path",
construction_rule =
"filtered_project_roots|structural_group",
resource_type = "file"
)