Technical architecture and a proposed 3x3 benchmarking methodology for the OpenXLA compiler ecosystem across TPU v6e, NVIDIA H200, and AMD MI300X.

The paper “Technical Architecture and Systematic Benchmarking of the OpenXLA Ecosystem” is a proposed research protocol: the architectural analysis and benchmarking methodology are complete, and experimental results are forthcoming pending hardware access.

The problem

The proliferation of frontend frameworks (JAX, PyTorch, TensorFlow) combined with an increasingly heterogeneous hardware landscape (GPUs, TPUs, custom ASICs) has created a fragmentation problem that impedes portable and efficient model deployment. OpenXLA—developed jointly by Google, AMD, Intel, NVIDIA, and AWS—addresses this through a unified compiler ecosystem built on StableHLO as a portability layer and PJRT as a pluggable hardware interface.

What the paper delivers

An architectural analysis of the OpenXLA compiler pipeline, informed by direct contributions to the OpenXLA codebase.
A detailed characterization of three contemporary accelerator families—TPU v6e (Trillium), NVIDIA H200 (Hopper), and AMD MI300X—and their interaction with the XLA compiler.
A comparative economic analysis of cross-platform deployment costs and energy efficiency.
A formal 3×3 benchmarking methodology: three representative workloads (LLM inference, dense training, sparse embedding training) evaluated across all three backends.

Five ways this differs from existing benchmarks

Benchmarks the compiler, not just the hardware. Uses hlo-opt ablation to isolate fusion, CSE, and algebraic simplification.
Measures the portability tax. Compares OpenXLA (JAX → StableHLO → XLA → hardware) against native paths (vLLM/CUDA, PyTorch/ROCm, direct HLO on TPU).
Quantifies optimization transferability. Measures whether the same passes help equally on systolic array (TPU), SIMT (NVIDIA), and CDNA (AMD).
Exposes compiler version sensitivity. All experiments pin a specific XLA commit hash.
Covers three accelerator paradigms rather than the two-platform designs typical of prior work.

Read the paper

paper1.pdf
paper1.tex (LaTeX source)

Paper Overview: The OpenXLA Ecosystem and a 3×3 Benchmarking Protocol

The problem

What the paper delivers

Five ways this differs from existing benchmarks

Read the paper