ASTRA-sim · Collective Scaling

About ASTRA-sim · Collective Scaling

This site documents a simulation study of collective-communication scaling for distributed ML, built on ASTRA-sim. It models AllReduce and AllGather across torus and switch topologies on the analytical and ns-3 backends, comparing latency-bound vs bandwidth-bound behavior as a function of message size and node count.

The full harness — a Docker build, a bytes-based Chakra workload generator, the sweep runner, and the plotting code — is reproducible and open: github.com/kredd2506/Astro.

— Manish Reddy