2025link

August 25, 2025
in Performance
30 min read

Data-Tiling Walkthrough

Data-tiling is the modification of data layout of operands of certain operations, such as matrix multiplication, that prefer specific layouts. These layout preferences depend on the operations and the target hardware. For example, matrix multiplications may need to use hardware matrix multiplication instructions that perform optimally with a specific matrix data layout.

Layout changes may also be motivated by memory access performance, as data-tiling can result in improved locality of memory accesses, fewer cache lines being accessed, and generally simpler memory access patterns that are more likely to be handled performantly by the memory system.

These layout changes can be propagated as far as possible across the workload, so the entire workload can use the updated layouts, as opposed to having to perform layout transformations at runtime. This may involve fusions or constant-evaluation that can amortize or remove layout-transformation overheads.

The main conceptual difficulty in modeling this in tensor-level MLIR is that tensors don't have layouts: tensors are higher-level, abstract arrays. This is addressed by the concept of tensor encodings, which this document will explain.