Two shorter threads in the paper: the diagnostic tooling that makes compiler-level benchmarking possible, and the economic pressures that make cross-platform benchmarking urgent.
XProf: end-to-end profiling
XProf works across JAX, TensorFlow, and PyTorch/XLA. Three components get direct use in the methodology:
- Trace Viewer — host/device execution timeline; identifies communication gaps and idle periods.
- HLO Op Stats — highlights time-consuming operations; reports GFLOPS/s and rematerialization overhead.
- Memory Profile Viewer — monitors HBM usage; surfaces peak heap consumption and potential stack exhaustion.
hlo-opt: isolated pass measurement
The hlo-opt tool executes individual compiler passes independently of the full pipeline. This isolation is what makes the paper’s optimization-transferability measurement possible: you can run just AlgebraicSimplifier or HloRematerialization on a given input module and attribute a performance delta to that specific pass on that specific backend.
The diagnostic toolkit
| Tool | Primary Use Case | Target Platform |
|---|---|---|
hlo-opt |
Pass development and IR conversion | CPU, GPU, TPU |
run_hlo_module |
Microbenchmarking HLO snippets | CPU, GPU, TPU |
xprof |
End-to-end execution profiling | GPU, TPU |
multihost_hlo_runner |
SPMD and multi-node benchmarking | Distributed |
Why cross-platform benchmarking is urgent now
SemiAnalysis projects that by 2030, inference will consume 75% of all AI compute. At that scale, platform economics become decisive for infrastructure planning.
Vendor benchmarks suggest large cost differentials:
| Metric | TPU v6e | NVIDIA H200 | Advantage |
|---|---|---|---|
| Cost per Hour | ~$1.38 | ~$2.50+ | TPU (45% cheaper) |
| Inference Perf. / $ | 4× baseline | Baseline | TPU |
| Power Efficiency | 60–65% less | Baseline | TPU |
| Framework Maturity | JAX (native) | CUDA (universal) | NVIDIA |
But these are vendor-controlled configurations and may not generalize. Independent validation—which the 3×3 methodology is designed to provide—is necessary to substantiate or qualify the claims.
Full details in the paper, Sections 6–7.