
Derive operational reuse metrics from contextual resources
derive_reuse_metrics.RdDerives lightweight operational reuse and reconstruction metrics from contextual resource observations.
The function summarizes how frequently contextual resources appear:
across observations;
across Record Sets;
across storage locations;
and across time.
The resulting metrics support:
duplication analysis;
reconstruction workflows;
synchronized workspace inspection;
cross-project reuse detection;
provenance-aware reporting;
and forensic review workflows.
In many filesystem workflows, the resulting metrics approximate how digital resources ("files") evolve, move, synchronize, and reappear across operational environments.
The function is designed to work together with:
as part of layered provenance-aware reconstruction workflows.
Usage
derive_reuse_metrics(
x,
resource_id = "resource_id",
record_set_id = "record_set_id",
storage_path_id = "storage_path_id",
timestamp = "mtime",
location = "full_path"
)Arguments
- x
A
data.frameor tibble containing contextual resource observations.- resource_id
Character scalar identifying the column representing contextual resource identity.
Defaults to
"resource_id".- record_set_id
Character scalar identifying the contextual Record Set membership column.
Defaults to
"record_set_id".- storage_path_id
Character scalar identifying the storage-scoped path identifier column.
Defaults to
"storage_path_id".- timestamp
Character scalar identifying the timestamp column used for temporal reconstruction.
Defaults to
"mtime".- location
Character scalar identifying the human-readable location column.
Defaults to
"full_path".
Value
A tibble containing operational reuse metrics.
Typical output variables include:
n_observationsn_record_setsn_pathsfirst_seenlast_seenlocations
Details
The function intentionally derives lightweight operational metrics only.
It does not:
infer authoritative identity;
reconcile evolving resources;
perform provenance reasoning;
determine archival significance;
replace curatorial interpretation.
Metrics are derived from contextual operational observations and should be interpreted as analytical indicators rather than authoritative documentary assertions.
Examples
toy_resources <- tibble::tibble(
resource_id = c(
"res_001",
"res_001",
"res_002"
),
record_set_id = c(
"project_a",
"project_b",
"project_a"
),
storage_path_id = c(
"laptop::analysis.R",
"backup::analysis.R",
"laptop::report.qmd"
),
mtime = as.POSIXct(c(
"2025-01-01",
"2025-01-03",
"2025-01-02"
)),
full_path = c(
"D:/project/analysis.R",
"E:/backup/analysis.R",
"D:/project/report.qmd"
)
)
derive_reuse_metrics(
toy_resources
)
#> # A tibble: 2 × 7
#> resource_id n_observations n_record_sets n_paths first_seen
#> <chr> <int> <int> <int> <dttm>
#> 1 res_001 2 2 2 2025-01-01 00:00:00
#> 2 res_002 1 1 1 2025-01-02 00:00:00
#> # ℹ 2 more variables: last_seen <dttm>, locations <chr>