'iree_cpu' Dialectlink

A dialect for common functionality used by CPU focused IREE code generation.

This dialect provides operations and attributes to aid in code generation for CPU targets. The functionality in this dialect can be hardware specific, but is intended to be independent of the lowering target. Late lowerings to LLVM are handled separately.

'iree_cpu' Dialect
- Attributes
- Enums
  - LoweringPipeline
    - Cases:
  - MMAIntrinsic
    - Cases:

Attributeslink

CPUEncodingResolverAttrlink

The encoding layout attribute for CPU backends.

Syntax:

#iree_cpu.cpu_encoding_resolver<
  DictionaryAttr   # configuration
>

This attribute can implement any layout interface methods for encoding serialization and or materialization, e.g., Encoding::LayoutMaterializerAttr, Codegen::PackedLayoutMaterializerAttr, etc. They are implemented through external model mechanism See the implementation in compiler/Codegen/ExternalInterfaces/*.

Parameters:link

Parameter	C++ type	Description
configuration	`DictionaryAttr`	Executable target configuration. It is expected to be used in a pass scope, but not the final IR output.

DataTiledMMAAttrlink

Syntax:

#iree_cpu.data_tiled_mma_layout<
  `None` | `MMA_X86_AVX2_FMA_1x8x1_F32_F32` | `MMA_X86_AVX2_FMA_8x1x1_F32_F32` | `MMA_X86_AVX512_1x8x1_F64_F64` | `MMA_X86_AVX512_8x1x1_F64_F64` | `MMA_X86_AVX512_1x16x1_F32_F32` | `MMA_X86_AVX512_16x1x1_F32_F32` | `MMA_X86_AVX512_1x16x1_F32_F16_CASTF32` | `MMA_X86_AVX512_16x1x1_F32_F16_CASTF32` | `MMA_X86_AVX512FP16_1x32x1_F16_F16` | `MMA_X86_AVX512FP16_32x1x1_F16_F16` | `MMA_X86_AVX512BF16_1x16x2_F32_BF16` | `MMA_X86_AVX512BF16_16x1x2_F32_BF16` | `MMA_X86_AVX512_1x16x2_I32_I16` | `MMA_X86_AVX512_16x1x2_I32_I16` | `MMA_X86_AVX512VNNI_1x16x2_I32_I16` | `MMA_X86_AVX512VNNI_16x1x2_I32_I16` | `MMA_X86_AVX512_1x16x2_I32_I8_CASTI16` | `MMA_X86_AVX512_16x1x2_I32_I8_CASTI16` | `MMA_X86_AVX512VNNI_1x16x2_I32_I8_CASTI16` | `MMA_X86_AVX512VNNI_16x1x2_I32_I8_CASTI16` | `MMA_X86_AVX512VNNI_1x16x4_I32_UI8_I8` | `MMA_X86_AVX512VNNI_16x1x4_I32_I8_UI8` | `MMA_X86_AVX512VNNI_16x16x2_I32_I8_CASTI16` | `MMA_ARM_SVE_FMLA_1x4VLx1_F32_F32` | `MMA_ARM_SVE_FMLA_4VLx1x1_F32_F32` | `MMA_GENERIC_SCALAR_1x1x1_REG8` | `MMA_GENERIC_SCALAR_1x1x1_REG16`,   # intrinsic
  int64_t,   # intrinsics_m
  int64_t,   # intrinsics_n
  int64_t,   # intrinsics_k
  ::mlir::Type,   # lhs_type
  ::mlir::Type,   # rhs_type
  ::mlir::Type   # acc_type
>

CPU analogue of IREEGPU_DataTiledMMAAttr, for use with iree_codegen.inner_tiled. Like the GPU case, this wraps an intrinsic-enum and some intrinsics_{m,n,k} unrolling factor. Unlike the GPU case, there is no thread-distribution, no concept of subgroups and no interleaving of intrinsics' layout.

Each non-square hardware MMA appears as two MMAIntrinsic enum values — one per orientation (e.g. MMA_X86_AVX512_1x16x1_F32_F32 and its M↔N-swapped sibling MMA_X86_AVX512_16x1x1_F32_F32). The cost model treats them as distinct candidates and picks whichever fits the matmul shape better.

For most intrinsic values the (LHS, RHS, ACC) element types are baked into the enum and lhs_type / rhs_type / acc_type are unused. The one exception is MMA_GENERIC_SCALAR_1x1x1: it is a type-polymorphic fallback used when no element-type-specific intrinsic matches the target, and it carries its element types in those three optional parameters instead. This deliberately breaks the otherwise-strong invariant that an MMAIntrinsic enum value pins down a specific element type triple, in exchange for not having to add one enum value per supported (LHS, RHS, ACC) combination.

Some GPU-specific methods in IREECodegen_InnerTileDescAttrInterface are left here but are unused.

Parameters:link

Parameter	C++ type	Description
intrinsic	`::mlir::iree_compiler::IREE::CPU::MMAIntrinsic`	an enum of type MMAIntrinsic
intrinsics_m	`int64_t`	Intrinsic count along the M dimension.
intrinsics_n	`int64_t`	Intrinsic count along the N dimension.
intrinsics_k	`int64_t`	Intrinsic count along the K dimension.
lhs_type	`::mlir::Type`	LHS element type, used only by type-polymorphic intrinsics such as MMA_GENERIC_SCALAR_1x1x1.
rhs_type	`::mlir::Type`	RHS element type, used only by type-polymorphic intrinsics such as MMA_GENERIC_SCALAR_1x1x1.
acc_type	`::mlir::Type`	ACC element type, used only by type-polymorphic intrinsics such as MMA_GENERIC_SCALAR_1x1x1.

InnerTiledSemanticsAttrlink

Syntax: #iree_cpu.mma_semantics

Attribute describing aspects of inner-tiled MMA semantics that are orthogonal to the data_tiled_mma_layout kind. On CPU, tiles are always undistributed (no thread distribution) and always expanded (opaque = false), so there is currently no parameter here, making this temporarily a unit attribute, but this could evolve in the future to look more like IREEGPU_InnerTiledSemanticsAttr.

LoweringConfigAttrlink

Drive lowering of an operation for cpu compilation.

CPU specific implementation of a lowering config. This carries just a dictionary attribute to store any relevant fields. This is the simplest form of a lowering config, offering flexibility at the cost of structure.

For some key entries, e.g., distribution, etc., they must be IREE::Codegen::LoweringConfigTilingLevelAttr, which is a list of tile sizes with optional scalable representation like vector types. E.g.,

#iree_cpu.lowering_config< distribution = [128, 128, 0], cache_parallel = [64, 64, 0], cache_reduction = [0, 0, 16], vector_common_parallel = [[4], [4], 0], vector_reduction = [0, 0, [4]], vector_inner_parallel = [0, 0, 0]

For more details, see the implementation in IREECPUAttrs.cpp.

Note that it is undefined if more than one of vector tiling levels set a value on a dimension. They are expected to be disjoint. It is not enforced in the verifier, because we want to keep the flexibility when something is wrong in a lowering config. E.g., some transformations still work even if they are not disjoint.

Parameters:link

Parameter	C++ type	Description
config	`DictionaryAttr`	The configured fields, including tiling levels.

MMAIntrinsicAttrlink

Descriptor for different MMA intrinsics

Syntax:

#iree_cpu.mma_intrinsic<
  `None` | `MMA_X86_AVX2_FMA_1x8x1_F32_F32` | `MMA_X86_AVX2_FMA_8x1x1_F32_F32` | `MMA_X86_AVX512_1x8x1_F64_F64` | `MMA_X86_AVX512_8x1x1_F64_F64` | `MMA_X86_AVX512_1x16x1_F32_F32` | `MMA_X86_AVX512_16x1x1_F32_F32` | `MMA_X86_AVX512_1x16x1_F32_F16_CASTF32` | `MMA_X86_AVX512_16x1x1_F32_F16_CASTF32` | `MMA_X86_AVX512FP16_1x32x1_F16_F16` | `MMA_X86_AVX512FP16_32x1x1_F16_F16` | `MMA_X86_AVX512BF16_1x16x2_F32_BF16` | `MMA_X86_AVX512BF16_16x1x2_F32_BF16` | `MMA_X86_AVX512_1x16x2_I32_I16` | `MMA_X86_AVX512_16x1x2_I32_I16` | `MMA_X86_AVX512VNNI_1x16x2_I32_I16` | `MMA_X86_AVX512VNNI_16x1x2_I32_I16` | `MMA_X86_AVX512_1x16x2_I32_I8_CASTI16` | `MMA_X86_AVX512_16x1x2_I32_I8_CASTI16` | `MMA_X86_AVX512VNNI_1x16x2_I32_I8_CASTI16` | `MMA_X86_AVX512VNNI_16x1x2_I32_I8_CASTI16` | `MMA_X86_AVX512VNNI_1x16x4_I32_UI8_I8` | `MMA_X86_AVX512VNNI_16x1x4_I32_I8_UI8` | `MMA_X86_AVX512VNNI_16x16x2_I32_I8_CASTI16` | `MMA_ARM_SVE_FMLA_1x4VLx1_F32_F32` | `MMA_ARM_SVE_FMLA_4VLx1x1_F32_F32` | `MMA_GENERIC_SCALAR_1x1x1_REG8` | `MMA_GENERIC_SCALAR_1x1x1_REG16`   # value
>

Parameters:link

Parameter	C++ type	Description
value	`::mlir::iree_compiler::IREE::CPU::MMAIntrinsic`	an enum of type MMAIntrinsic

PipelineAttrlink

CPU lowering pipeline identifier.

Syntax:

#iree_cpu.pipeline<
  `Default` | `DoubleTilingExpert` | `ConvTileAndDecomposeExpert` | `Mmt4dTilingExpert` | `BufferOpsTileAndVectorize` | `DataTiling` | `LinalgExtTileAndVectorize`   # value
>

Identifies a CPU lowering pipeline. Implements PipelineAttrInterface by delegating to a builder callback registered via registerCPUPipelineBuilder(). The builder must handle all LoweringPipeline enum values.

Parameters:link

Parameter	C++ type	Description
value	`::mlir::iree_compiler::IREE::CPU::LoweringPipeline`	an enum of type LoweringPipeline

UKernelProviderAttrlink

Provides built-in C-bitcode ukernel implementations for the LLVMCPU target backend.

Syntax: #iree_cpu.ukernel_provider

Implements the UKernelProviderInterface for the LLVMCPU target. createAndReplaceWithUkernelOp rewrites an iree_codegen.inner_tiled op carrying an iree_cpu.data_tiled_mma_layout and an iree_codegen.ukernel = "<name>" descriptor into an iree_codegen.ukernel.generic call, and attaches the matching ukernel bitcode as a hal.executable_object on the dispatch's executable variant.

Bitcode is resolved by name: first against any hal.executable.objects already attached above the op (so user-supplied bitcode wins), then against ukernels embedded into iree-compile at LLVMCPU plugin init.

VMVXEncodingResolverAttrlink

The encoding layout attribute for VMVX backend.

Syntax:

#iree_cpu.vmvx_encoding_resolver<
  DictionaryAttr   # configuration
>

This attribute can implement any layout interface methods for encoding serialization and or materialization, e.g., Encoding::LayoutMaterializerAttr, Codegen::PackedLayoutMaterializerAttr, etc. They are implemented through external model mechanism See the implementation in compiler/Codegen/ExternalInterfaces/*.

Parameters:link

Parameter	C++ type	Description
configuration	`DictionaryAttr`	Executable target configuration. It is expected to be used in a pass scope, but not the final IR output.

Enumslink

LoweringPipelinelink

LLVMCPU lowering pipeline identifier

Cases:link

Symbol	Value	String
Default	`0`	Default
DoubleTilingExpert	`1`	DoubleTilingExpert
ConvTileAndDecomposeExpert	`2`	ConvTileAndDecomposeExpert
Mmt4dTilingExpert	`3`	Mmt4dTilingExpert
BufferOpsTileAndVectorize	`4`	BufferOpsTileAndVectorize
DataTiling	`5`	DataTiling
LinalgExtTileAndVectorize	`6`	LinalgExtTileAndVectorize

MMAIntrinsiclink

Descriptor for different MMA intrinsics

Cases:link

Symbol	Value	String
None	`0`	None
MMA_X86_AVX2_FMA_1x8x1_F32_F32	`4624`	MMA_X86_AVX2_FMA_1x8x1_F32_F32
MMA_X86_AVX2_FMA_8x1x1_F32_F32	`4625`	MMA_X86_AVX2_FMA_8x1x1_F32_F32
MMA_X86_AVX512_1x8x1_F64_F64	`4864`	MMA_X86_AVX512_1x8x1_F64_F64
MMA_X86_AVX512_8x1x1_F64_F64	`4865`	MMA_X86_AVX512_8x1x1_F64_F64
MMA_X86_AVX512_1x16x1_F32_F32	`4880`	MMA_X86_AVX512_1x16x1_F32_F32
MMA_X86_AVX512_16x1x1_F32_F32	`4881`	MMA_X86_AVX512_16x1x1_F32_F32
MMA_X86_AVX512_1x16x1_F32_F16_CASTF32	`4896`	MMA_X86_AVX512_1x16x1_F32_F16_CASTF32
MMA_X86_AVX512_16x1x1_F32_F16_CASTF32	`4897`	MMA_X86_AVX512_16x1x1_F32_F16_CASTF32
MMA_X86_AVX512FP16_1x32x1_F16_F16	`4898`	MMA_X86_AVX512FP16_1x32x1_F16_F16
MMA_X86_AVX512FP16_32x1x1_F16_F16	`4899`	MMA_X86_AVX512FP16_32x1x1_F16_F16
MMA_X86_AVX512BF16_1x16x2_F32_BF16	`4912`	MMA_X86_AVX512BF16_1x16x2_F32_BF16
MMA_X86_AVX512BF16_16x1x2_F32_BF16	`4913`	MMA_X86_AVX512BF16_16x1x2_F32_BF16
MMA_X86_AVX512_1x16x2_I32_I16	`5024`	MMA_X86_AVX512_1x16x2_I32_I16
MMA_X86_AVX512_16x1x2_I32_I16	`5025`	MMA_X86_AVX512_16x1x2_I32_I16
MMA_X86_AVX512VNNI_1x16x2_I32_I16	`5026`	MMA_X86_AVX512VNNI_1x16x2_I32_I16
MMA_X86_AVX512VNNI_16x1x2_I32_I16	`5027`	MMA_X86_AVX512VNNI_16x1x2_I32_I16
MMA_X86_AVX512_1x16x2_I32_I8_CASTI16	`5056`	MMA_X86_AVX512_1x16x2_I32_I8_CASTI16
MMA_X86_AVX512_16x1x2_I32_I8_CASTI16	`5057`	MMA_X86_AVX512_16x1x2_I32_I8_CASTI16
MMA_X86_AVX512VNNI_1x16x2_I32_I8_CASTI16	`5058`	MMA_X86_AVX512VNNI_1x16x2_I32_I8_CASTI16
MMA_X86_AVX512VNNI_16x1x2_I32_I8_CASTI16	`5059`	MMA_X86_AVX512VNNI_16x1x2_I32_I8_CASTI16
MMA_X86_AVX512VNNI_1x16x4_I32_UI8_I8	`5060`	MMA_X86_AVX512VNNI_1x16x4_I32_UI8_I8
MMA_X86_AVX512VNNI_16x1x4_I32_I8_UI8	`5061`	MMA_X86_AVX512VNNI_16x1x4_I32_I8_UI8
MMA_X86_AVX512VNNI_16x16x2_I32_I8_CASTI16	`5062`	MMA_X86_AVX512VNNI_16x16x2_I32_I8_CASTI16
MMA_ARM_SVE_FMLA_1x4VLx1_F32_F32	`8720`	MMA_ARM_SVE_FMLA_1x4VLx1_F32_F32
MMA_ARM_SVE_FMLA_4VLx1x1_F32_F32	`8721`	MMA_ARM_SVE_FMLA_4VLx1x1_F32_F32
MMA_GENERIC_SCALAR_1x1x1_REG8	`61448`	MMA_GENERIC_SCALAR_1x1x1_REG8
MMA_GENERIC_SCALAR_1x1x1_REG16	`61456`	MMA_GENERIC_SCALAR_1x1x1_REG16