# Evaluation pipeline Aggregates raw per-run measurements into figures (PDF + tikz) and a pgfkeys `values.tex` for the paper. ## Run With Docker (reproducible toolchain): docker compose up --build Outputs land in `out/`: - `out/values.tex` -- all `\val{...}` keys for the paper - `out//*.pdf` -- per-experiment figures - `out//*.tex` -- tikz versions (via `make tikz`) Without Docker, requires R (with renv), Python 3.12+, uv, LaTeX (for tikzDevice's metric probe). Then: make all ## Targets - `make all` (= `figures values`) - `make figures` -- PDFs for all experiments - `make tikz` -- tikz `.tex` for all experiments - `make values` -- regenerate `out/values.tex` only - `make derive` -- aggregate raw data into `derived//*.csv` - `make sanity` -- shape + NaN + solution-coverage checks on derived CSVs ## Configuration `RAW_DATA_ROOT` controls where the aggregator reads raw aggregates from. Precedence (highest first): 1. command line: `make RAW_DATA_ROOT=/mnt/data ...` 2. environment: `RAW_DATA_ROOT=/mnt/data make ...` 3. `local.mk` (copy from `local.mk.example`) 4. Makefile default: `../raw_data` For Docker, the same variable picks the host path that gets bind-mounted as `/raw_data` inside the container: RAW_DATA_ROOT=/mnt/data docker compose up ## Layout analysis/ Python: aggregation, sanity, gen_values, plugins values/ one plugin per metric family (cpu, rtt, idt, ...) figures/ R: ggplot scripts, common.R, renv.lock out/ generated (gitignored) derived/ intermediate CSVs (gitignored) ## Plugins `analysis/values/*.py` each expose `compute(derived) -> (keys, sources)`. Keys are pgfkeys paths (e.g. `datacenter-fq/sender-cpu/cake/mean-pct`); `gen_values.py` merges them, wraps numbers in `\qty{}{}` / `\num{}` by suffix, and writes one `values.tex`. Add a new metric by dropping a new plugin into `analysis/values/`.