Technical architecture and a proposed 3x3 benchmarking methodology for the OpenXLA compiler ecosystem across TPU v6e, NVIDIA H200, and AMD MI300X.

Technical Architecture and Systematic Benchmarking of the OpenXLA Ecosystem

DRAFT v2 — April 2026

A cross-platform analysis of modern ML compilers: an architectural study of the OpenXLA pipeline (CHLO → StableHLO → XLA → LLVM) paired with a proposed 3×3 benchmarking protocol evaluating three workloads (LLM inference, dense training, sparse embedding training) across TPU v6e, NVIDIA H200, and AMD MI300X.

Status: Proposed research protocol. Architectural analysis and methodology are complete; experimental results are forthcoming pending hardware access.

Read the paper (PDF) · LaTeX source · About this work

Posts

Diagnostic Tools and the Economics of Cross-Platform Deployment

17 Apr 2026

Two shorter threads in the paper: the diagnostic tooling that makes compiler-level benchmarking possible, and the economic pressures that make cross-platform benchmarking urgent.

The 3×3 Benchmarking Methodology and Testable Hypotheses

15 Apr 2026

The paper formalizes a 3×3 experimental design: three representative ML workloads evaluated across three hardware backends, for nine distinct cells. Each cell has defined KPIs.

Three Accelerator Paradigms: TPU v6e, NVIDIA H200, and AMD MI300X

12 Apr 2026

The 3×3 study spans three distinct accelerator paradigms. Each interacts with XLA’s fusion, buffer analysis, and partitioning strategies differently.

PJRT: The Pluggable Just-in-Time Runtime

09 Apr 2026

To deliver the “run anywhere” half of OpenXLA’s promise, the ecosystem ships PJRT—a hardware- and framework-independent interface for ML compilers and runtimes. PJRT simplifies new hardware integration by exposing a stable C API that abstracts device management, memory allocation, and executable execution.

The XLA Compiler Pipeline: Target-Independent Passes and LLVM Lowering

06 Apr 2026

The XLA compiler splits cleanly into target-independent analysis passes and target-specific code generation. This separation lets high-level optimizations benefit every backend while still exploiting the microarchitectural features of specific hardware.

The OpenXLA IR Hierarchy: StableHLO, CHLO, and VHLO

04 Apr 2026

OpenXLA’s effectiveness as a portability layer hinges on a multi-level dialect hierarchy within the MLIR framework. This post summarizes how the hierarchy is organized and why the design choices matter.