Evaluation pipeline
Aggregates raw per-run measurements into figures (PDF + tikz) and a pgfkeys
values.tex for the paper.
Run
With Docker (reproducible toolchain):
docker compose up --build
Outputs land in out/:
out/values.tex-- all\val{...}keys for the paperout/<experiment>/*.pdf-- per-experiment figuresout/<experiment>/*.tex-- tikz versions (viamake tikz)
Without Docker, requires R (with renv), Python 3.12+, uv, LaTeX (for tikzDevice's metric probe). Then:
make all
Targets
make all(=figures values)make figures-- PDFs for all experimentsmake tikz-- tikz.texfor all experimentsmake values-- regenerateout/values.texonlymake derive-- aggregate raw data intoderived/<experiment>/*.csvmake sanity-- shape + NaN + solution-coverage checks on derived CSVs
Configuration
RAW_DATA_ROOT controls where the aggregator reads raw aggregates from.
Precedence (highest first):
- command line:
make RAW_DATA_ROOT=/mnt/data ... - environment:
RAW_DATA_ROOT=/mnt/data make ... local.mk(copy fromlocal.mk.example)- Makefile default:
../raw_data
For Docker, the same variable picks the host path that gets bind-mounted as
/raw_data inside the container:
RAW_DATA_ROOT=/mnt/data docker compose up
Layout
analysis/ Python: aggregation, sanity, gen_values, plugins
values/ one plugin per metric family (cpu, rtt, idt, ...)
figures/ R: ggplot scripts, common.R, renv.lock
out/ generated (gitignored)
derived/ intermediate CSVs (gitignored)
Plugins
analysis/values/*.py each expose compute(derived) -> (keys, sources).
Keys are pgfkeys paths (e.g. datacenter-fq/sender-cpu/cake/mean-pct);
gen_values.py merges them, wraps numbers in \qty{}{} / \num{} by
suffix, and writes one values.tex. Add a new metric by dropping a new
plugin into analysis/values/.