Skip to content


IREE / MLIR / Linalg tutorial


This tutorial is simultaneously about IREE, MLIR, and specifically the MLIR Linalg dialect.

What is MLIR?

MLIR is a programming language, but MLIR in itself is almost just an empty shell. What it really provides is a framework allowing to define MLIR dialects which are where the features come from.

The "IR" part of the MLIR name stands for "intermediate representation". It means that MLIR is meant to be primarily for compiler-internal representations of code. But MLIR is actually fairly nice for humans to work with, and it's not hard to hand-author some MLIR programs from scratch. That is exactly the topic of this tutorial.

Exploring CPU microkernels on a matmul example

Basic setup, command lines

Source file: matmul.mlir:

func.func @matmul_dynamic(%lhs: tensor<?x?xf32>, %rhs: tensor<?x?xf32>, %acc: tensor<?x?xf32>) -> tensor<?x?xf32> {
  %result = linalg.matmul ins(%lhs, %rhs: tensor<?x?xf32>, tensor<?x?xf32>) outs(%acc: tensor<?x?xf32>) -> tensor<?x?xf32>
  return %result: tensor<?x?xf32>

Basic compilation command line:

$ iree-compile matmul.mlir -o /tmp/matmul.vmfb \
  --iree-hal-target-backends=llvm-cpu \
  --iree-llvmcpu-target-cpu=znver4 \

This creates a IREE bytecode module:

$ ls -l /tmp/matmul.vmfb

-rw-rw-r-- 1 2884 Jan 22 10:37 /tmp/matmul.vmfb

Matrix Multiplication with MMT4D


Matrix multiplication (matmul) is an important operation in ML workloads that poses specific challenges to code generation. For example, matmul makes repeated accesses to the same data, which makes locality of reference a top concern.

Moreover, modern CPUs instruction set architectures (ISAs) offer specialized SIMD instructions that the matmul implementation needs to use to achieve optimal performance, and these instructions expect data to be in a particular layout.

This article is about an in-development MLIR operation, linalg.mmt4d, offering a compilation path for linalg.matmul that is designed from the ground up for these efficiency considerations.