Evaluation pipeline

Aggregates raw per-run measurements into figures (PDF + tikz) and a pgfkeys values.tex for the paper.

Run

With Docker (reproducible toolchain):

docker compose up --build

Outputs land in out/:

out/values.tex -- all \val{...} keys for the paper
out/<experiment>/*.pdf -- per-experiment figures
out/<experiment>/*.tex -- tikz versions (via make tikz)

Without Docker, requires R (with renv), Python 3.12+, uv, LaTeX (for tikzDevice's metric probe). Then:

make all

Targets

make all (= figures values)
make figures -- PDFs for all experiments
make tikz -- tikz .tex for all experiments
make values -- regenerate out/values.tex only
make derive -- aggregate raw data into derived/<experiment>/*.csv
make sanity -- shape + NaN + solution-coverage checks on derived CSVs

Configuration

RAW_DATA_ROOT controls where the aggregator reads raw aggregates from. Precedence (highest first):

command line: make RAW_DATA_ROOT=/mnt/data ...
environment: RAW_DATA_ROOT=/mnt/data make ...
local.mk (copy from local.mk.example)
Makefile default: ../raw_data

For Docker, the same variable picks the host path that gets bind-mounted as /raw_data inside the container:

RAW_DATA_ROOT=/mnt/data docker compose up

Layout

analysis/         Python: aggregation, sanity, gen_values, plugins
  values/         one plugin per metric family (cpu, rtt, idt, ...)
figures/          R: ggplot scripts, common.R, renv.lock
out/              generated (gitignored)
derived/          intermediate CSVs (gitignored)

Plugins

analysis/values/*.py each expose compute(derived) -> (keys, sources). Keys are pgfkeys paths (e.g. datacenter-fq/sender-cpu/cake/mean-pct); gen_values.py merges them, wraps numbers in \qty{}{} / \num{} by suffix, and writes one values.tex. Add a new metric by dropping a new plugin into analysis/values/.