OpenXLA Benchmark

Technical Architecture and Systematic Benchmarking of the OpenXLA Ecosystem

A cross-platform analysis of modern ML compilers: an architectural study of the OpenXLA pipeline (CHLO → StableHLO → XLA → LLVM) paired with a proposed 3×3 benchmarking protocol evaluating three workloads (LLM inference, dense training, sparse embedding training) across TPU v6e, NVIDIA H200, and AMD MI300X.

Status: Proposed research protocol. Architectural analysis and methodology are complete; experimental results are forthcoming pending hardware access.

Read the paper (PDF)  ·  LaTeX source  ·  About this work

Posts

Diagnostic Tools and the Economics of Cross-Platform Deployment

Two shorter threads in the paper: the diagnostic tooling that makes compiler-level benchmarking possible, and the economic pressures that make cross-platform benchmarking urgent.

The 3×3 Benchmarking Methodology and Testable Hypotheses

The paper formalizes a 3×3 experimental design: three representative ML workloads evaluated across three hardware backends, for nine distinct cells. Each cell has defined KPIs.

Three Accelerator Paradigms: TPU v6e, NVIDIA H200, and AMD MI300X

The 3×3 study spans three distinct accelerator paradigms. Each interacts with XLA’s fusion, buffer analysis, and partitioning strategies differently.

PJRT: The Pluggable Just-in-Time Runtime

To deliver the “run anywhere” half of OpenXLA’s promise, the ecosystem ships PJRT—a hardware- and framework-independent interface for ML compilers and runtimes. PJRT simplifies new hardware integration by exposing a stable C API that abstracts device management, memory allocation, and executable execution.

The XLA Compiler Pipeline: Target-Independent Passes and LLVM Lowering

The XLA compiler splits cleanly into target-independent analysis passes and target-specific code generation. This separation lets high-level optimizations benefit every backend while still exploiting the microarchitectural features of specific hardware.

The OpenXLA IR Hierarchy: StableHLO, CHLO, and VHLO

OpenXLA’s effectiveness as a portability layer hinges on a multi-level dialect hierarchy within the MLIR framework. This post summarizes how the hierarchy is organized and why the design choices matter.