OpenXLA Benchmark

About OpenXLA Benchmark

This site accompanies the paper “Technical Architecture and Systematic Benchmarking of the OpenXLA Ecosystem: A Cross-Platform Analysis of Modern ML Compilers” (DRAFT v2 – April 2026).

Status: This paper presents a proposed research protocol. The architectural analysis and benchmarking methodology are complete; experimental results are forthcoming pending hardware access. Feedback on the methodology, hypotheses, and experimental design is welcome.

Abstract

The fragmentation of machine learning (ML) frameworks and hardware backends presents a critical barrier to portable, cost-effective model deployment. OpenXLA addresses this challenge through a unified compiler ecosystem built on StableHLO as a portability layer and PJRT as a pluggable hardware interface.

This paper provides an architectural analysis of the OpenXLA compiler pipeline—from its intermediate representation hierarchy (CHLO, StableHLO, VHLO) through target-independent optimization passes to hardware-specific code generation via LLVM—informed by direct contributions to the OpenXLA codebase. We characterize three contemporary accelerator families: Google Cloud TPU v6e (Trillium), NVIDIA H200 (Hopper), and AMD Instinct MI300X, examining how each interacts with the XLA compiler’s fusion, buffer analysis, and partitioning strategies. We propose a systematic 3×3 benchmarking methodology—evaluating three representative workloads (LLM inference, dense model training, and sparse embedding training) across all three hardware backends—and formalize testable hypotheses regarding abstraction overhead, optimization transferability, and SPMD scaling efficiency.

Contributions

  1. An architectural analysis of the OpenXLA compiler pipeline, informed by direct contributions to the OpenXLA codebase.
  2. A detailed characterization of three contemporary accelerator families—TPU v6e, NVIDIA H200, and AMD MI300X—and their interaction with the XLA compiler.
  3. A comparative economic analysis of cross-platform deployment costs and energy efficiency.
  4. A formal 3×3 benchmarking methodology for systematic, reproducible evaluation of ML compiler performance across heterogeneous hardware.

How this work differs from existing benchmarks

Paper