OpenXLA Benchmark

Paper Overview: The OpenXLA Ecosystem and a 3×3 Benchmarking Protocol

01 Apr 2026

The paper “Technical Architecture and Systematic Benchmarking of the OpenXLA Ecosystem” is a proposed research protocol: the architectural analysis and benchmarking methodology are complete, and experimental results are forthcoming pending hardware access.

The problem

The proliferation of frontend frameworks (JAX, PyTorch, TensorFlow) combined with an increasingly heterogeneous hardware landscape (GPUs, TPUs, custom ASICs) has created a fragmentation problem that impedes portable and efficient model deployment. OpenXLA—developed jointly by Google, AMD, Intel, NVIDIA, and AWS—addresses this through a unified compiler ecosystem built on StableHLO as a portability layer and PJRT as a pluggable hardware interface.

What the paper delivers

  1. An architectural analysis of the OpenXLA compiler pipeline, informed by direct contributions to the OpenXLA codebase.
  2. A detailed characterization of three contemporary accelerator families—TPU v6e (Trillium), NVIDIA H200 (Hopper), and AMD MI300X—and their interaction with the XLA compiler.
  3. A comparative economic analysis of cross-platform deployment costs and energy efficiency.
  4. A formal 3×3 benchmarking methodology: three representative workloads (LLM inference, dense training, sparse embedding training) evaluated across all three backends.

Five ways this differs from existing benchmarks

Read the paper