
Detect operationally generated or low-priority artifacts
detect_generated_artifacts.RdDetects generated, transient, synchronized, or operationally low-priority resources commonly encountered in filesystem-based reconstruction and preservation workflows.
The function is designed for provenance-aware analytical pipelines where generated artifacts may otherwise:
inflate duplication metrics;
obscure meaningful reconstruction signals;
introduce synchronization noise;
or reduce review efficiency.
Typical examples include:
generated website assets;
transient editor files;
synchronization metadata;
cached rendering artifacts;
font and frontend dependencies;
local workspace history files.
The function intentionally performs lightweight operational classification rather than authoritative preservation appraisal.
It is designed to work together with:
as part of layered provenance-aware reconstruction workflows.
Arguments
- x
A
data.frameor tibble containing observed resources.- filename
Character scalar identifying the column containing filenames.
Defaults to
"filename".- extension
Character scalar identifying the column containing file extensions.
Defaults to
"extension".- ignored_names
Character vector of filenames commonly treated as generated, synchronized, transient, or operational noise.
- ignored_extensions
Character vector of file extensions commonly associated with generated or low-priority artifacts.
Value
A logical vector indicating whether each resource is likely to represent a generated or operationally low-priority artifact.
Details
The function intentionally uses lightweight operational heuristics.
It does not:
inspect file contents;
infer preservation value;
determine archival significance;
perform semantic interpretation;
replace curatorial review.
Classification is based primarily on:
filename heuristics;
extension heuristics;
operational workflow conventions.
Future versions may support:
workflow-specific profiles;
preservation-oriented review vocabularies;
institution-specific ignore registries;
synchronized workspace heuristics.