Skip to contents

Detects generated, transient, synchronized, or operationally low-priority resources commonly encountered in filesystem-based reconstruction and preservation workflows.

The function is designed for provenance-aware analytical pipelines where generated artifacts may otherwise:

  • inflate duplication metrics;

  • obscure meaningful reconstruction signals;

  • introduce synchronization noise;

  • or reduce review efficiency.

Typical examples include:

  • generated website assets;

  • transient editor files;

  • synchronization metadata;

  • cached rendering artifacts;

  • font and frontend dependencies;

  • local workspace history files.

The function intentionally performs lightweight operational classification rather than authoritative preservation appraisal.

It is designed to work together with:

as part of layered provenance-aware reconstruction workflows.

Usage

detect_generated_artifacts(
  x,
  filename = "filename",
  extension = "extension",
  ignored_names = c(".Rhistory", ".RData", ".gitignore", ".DS_Store", "dir.c9r",
    "masterkey.cryptomator", "vault.cryptomator"),
  ignored_extensions = c("css", "js", "map", "woff", "woff2", "ttf", "c9r")
)

Arguments

x

A data.frame or tibble containing observed resources.

filename

Character scalar identifying the column containing filenames.

Defaults to "filename".

extension

Character scalar identifying the column containing file extensions.

Defaults to "extension".

ignored_names

Character vector of filenames commonly treated as generated, synchronized, transient, or operational noise.

ignored_extensions

Character vector of file extensions commonly associated with generated or low-priority artifacts.

Value

A logical vector indicating whether each resource is likely to represent a generated or operationally low-priority artifact.

Details

The function intentionally uses lightweight operational heuristics.

It does not:

  • inspect file contents;

  • infer preservation value;

  • determine archival significance;

  • perform semantic interpretation;

  • replace curatorial review.

Classification is based primarily on:

  • filename heuristics;

  • extension heuristics;

  • operational workflow conventions.

Future versions may support:

  • workflow-specific profiles;

  • preservation-oriented review vocabularies;

  • institution-specific ignore registries;

  • synchronized workspace heuristics.

Examples

toy_files <- tibble::tibble(
  filename = c(
    ".Rhistory",
    "app.css",
    "analysis.R",
    "font.woff2"
  ),
  extension = c(
    "",
    "css",
    "R",
    "woff2"
  )
)

detect_generated_artifacts(
  toy_files
)
#> [1]  TRUE  TRUE FALSE  TRUE