Detect operationally generated or low-priority artifacts — detect_generated

Detects generated, transient, synchronized, or operationally low-priority resources commonly encountered in filesystem-based reconstruction and preservation workflows.

The function is designed for provenance-aware analytical pipelines where generated artifacts may otherwise:

inflate duplication metrics;
obscure meaningful reconstruction signals;
introduce synchronization noise;
or reduce review efficiency.

Typical examples include:

generated website assets;
transient editor files;
synchronization metadata;
cached rendering artifacts;
font and frontend dependencies;
local workspace history files.

The function intentionally performs lightweight operational classification rather than authoritative preservation appraisal.

It is designed to work together with:

as part of layered provenance-aware reconstruction workflows.

Usage

detect_generated_artifacts(
  x,
  filename = "filename",
  extension = "extension",
  ignored_names = c(".Rhistory", ".RData", ".gitignore", ".DS_Store", "dir.c9r",
    "masterkey.cryptomator", "vault.cryptomator"),
  ignored_extensions = c("css", "js", "map", "woff", "woff2", "ttf", "c9r")
)

Arguments

x

A data.frame or tibble containing observed resources.

filename

Character scalar identifying the column containing filenames.

Defaults to "filename".

extension

Character scalar identifying the column containing file extensions.

Defaults to "extension".

ignored_names

Character vector of filenames commonly treated as generated, synchronized, transient, or operational noise.

ignored_extensions

Character vector of file extensions commonly associated with generated or low-priority artifacts.

Value

A logical vector indicating whether each resource is likely to represent a generated or operationally low-priority artifact.

Details

The function intentionally uses lightweight operational heuristics.

It does not:

inspect file contents;
infer preservation value;
determine archival significance;
perform semantic interpretation;
replace curatorial review.

Classification is based primarily on:

filename heuristics;
extension heuristics;
operational workflow conventions.

Future versions may support:

workflow-specific profiles;
preservation-oriented review vocabularies;
institution-specific ignore registries;
synchronized workspace heuristics.

Examples

toy_files <- tibble::tibble(
  filename = c(
    ".Rhistory",
    "app.css",
    "analysis.R",
    "font.woff2"
  ),
  extension = c(
    "",
    "css",
    "R",
    "woff2"
  )
)

detect_generated_artifacts(
  toy_files
)
#> [1]  TRUE  TRUE FALSE  TRUE