# Evaluation pipeline

Aggregates raw per-run measurements into figures (PDF + tikz) and a pgfkeys
`values.tex` for the paper.

## Run

With Docker (reproducible toolchain):

    docker compose up --build

Outputs land in `out/`:
- `out/values.tex` -- all `\val{...}` keys for the paper
- `out/<experiment>/*.pdf` -- per-experiment figures
- `out/<experiment>/*.tex` -- tikz versions (via `make tikz`)

Without Docker, requires R (with renv), Python 3.12+, uv, LaTeX (for
tikzDevice's metric probe). Then:

    make all

## Targets

- `make all` (= `figures values`)
- `make figures` -- PDFs for all experiments
- `make tikz` -- tikz `.tex` for all experiments
- `make values` -- regenerate `out/values.tex` only
- `make derive` -- aggregate raw data into `derived/<experiment>/*.csv`
- `make sanity` -- shape + NaN + solution-coverage checks on derived CSVs

## Configuration

`RAW_DATA_ROOT` controls where the aggregator reads raw aggregates from.
Precedence (highest first):

1. command line: `make RAW_DATA_ROOT=/mnt/data ...`
2. environment: `RAW_DATA_ROOT=/mnt/data make ...`
3. `local.mk` (copy from `local.mk.example`)
4. Makefile default: `../raw_data`

For Docker, the same variable picks the host path that gets bind-mounted as
`/raw_data` inside the container:

    RAW_DATA_ROOT=/mnt/data docker compose up

## Layout

    analysis/         Python: aggregation, sanity, gen_values, plugins
      values/         one plugin per metric family (cpu, rtt, idt, ...)
    figures/          R: ggplot scripts, common.R, renv.lock
    out/              generated (gitignored)
    derived/          intermediate CSVs (gitignored)

## Plugins

`analysis/values/*.py` each expose `compute(derived) -> (keys, sources)`.
Keys are pgfkeys paths (e.g. `datacenter-fq/sender-cpu/cake/mean-pct`);
`gen_values.py` merges them, wraps numbers in `\qty{}{}` / `\num{}` by
suffix, and writes one `values.tex`. Add a new metric by dropping a new
plugin into `analysis/values/`.