Technical architecture and a proposed 3x3 benchmarking methodology for the OpenXLA compiler ecosystem across TPU v6e, NVIDIA H200, and AMD MI300X.

Two shorter threads in the paper: the diagnostic tooling that makes compiler-level benchmarking possible, and the economic pressures that make cross-platform benchmarking urgent.

XProf: end-to-end profiling

XProf works across JAX, TensorFlow, and PyTorch/XLA. Three components get direct use in the methodology:

Trace Viewer — host/device execution timeline; identifies communication gaps and idle periods.
HLO Op Stats — highlights time-consuming operations; reports GFLOPS/s and rematerialization overhead.
Memory Profile Viewer — monitors HBM usage; surfaces peak heap consumption and potential stack exhaustion.

hlo-opt: isolated pass measurement

The hlo-opt tool executes individual compiler passes independently of the full pipeline. This isolation is what makes the paper’s optimization-transferability measurement possible: you can run just AlgebraicSimplifier or HloRematerialization on a given input module and attribute a performance delta to that specific pass on that specific backend.

The diagnostic toolkit

Tool	Primary Use Case	Target Platform
`hlo-opt`	Pass development and IR conversion	CPU, GPU, TPU
`run_hlo_module`	Microbenchmarking HLO snippets	CPU, GPU, TPU
`xprof`	End-to-end execution profiling	GPU, TPU
`multihost_hlo_runner`	SPMD and multi-node benchmarking	Distributed

Why cross-platform benchmarking is urgent now

SemiAnalysis projects that by 2030, inference will consume 75% of all AI compute. At that scale, platform economics become decisive for infrastructure planning.

Vendor benchmarks suggest large cost differentials:

Metric	TPU v6e	NVIDIA H200	Advantage
Cost per Hour	~$1.38	~$2.50+	TPU (45% cheaper)
Inference Perf. / $	4× baseline	Baseline	TPU
Power Efficiency	60–65% less	Baseline	TPU
Framework Maturity	JAX (native)	CUDA (universal)	NVIDIA

But these are vendor-controlled configurations and may not generalize. Independent validation—which the 3×3 methodology is designed to provide—is necessary to substantiate or qualify the claims.

Full details in the paper, Sections 6–7.

Diagnostic Tools and the Economics of Cross-Platform Deployment

XProf: end-to-end profiling

hlo-opt: isolated pass measurement

The diagnostic toolkit

Why cross-platform benchmarking is urgent now