'iree_codegen' Dialectlink
A dialect for common functionality used by IREE code generation.
This dialect is primarily meant to hold attributes that carry the
state of the compilation when lowered to scalar code for an
architecture. Typically, a backend starts by analyzing the entry
point functions within the hal.executable.variant and deciding
which compilation pipeline to chose. During this, even the values
for parameters such as tile sizes, etc. are also decided. The rest
of the compilation flow does not make any heuristic decisions,
rather just looks at the values of the decision specified in
attributes that belong to this dialect. This allows an external
search to easily override the heuristics that are hard-coded
within a backend.
- 'iree_codegen' Dialect
- Operations
- iree_codegen.dispatch_config (Codegen::DispatchConfigOp)
- iree_codegen.extract_strided_metadata (Codegen::ExtractStridedMetadataOp)
- iree_codegen.fusion_barrier (Codegen::FusionBarrierOp)
- iree_codegen.index_hint (Codegen::IndexHintOp)
- iree_codegen.inner_tiled (Codegen::InnerTiledOp)
- Attributes
- Examples
- iree_codegen.load_from_buffer (Codegen::LoadFromBufferOp)
- iree_codegen.null_pointer (Codegen::NullPointerOp)
- iree_codegen.query_tile_sizes (Codegen::QueryTileSizesOp)
- iree_codegen.smt.assert (Codegen::AssertOp)
- iree_codegen.smt.constraints (Codegen::ConstraintsOp)
- iree_codegen.smt.knob (Codegen::KnobOp)
- iree_codegen.smt.lookup (Codegen::LookupOp)
- iree_codegen.store_to_buffer (Codegen::StoreToBufferOp)
- iree_codegen.swizzle_hint (Codegen::SwizzleHintOp)
- iree_codegen.workgroup_count_hint (Codegen::WorkgroupCountHintOp)
- iree_codegen.yield (Codegen::YieldOp)
- Attributes
- DenormalFpMathAttr
- DispatchLoweringPassPipelineAttr
- CompilationInfoAttr
- ExportConfigAttr
- IntKnobAttr
- LoweringConfigAttr
- LoweringConfigTilingLevelAttr
- LoweringConfigTilingLevelsAttr
- OneOfKnobAttr
- PassPipelineAttr
- RootOpAttr
- RotateRowsAttr
- SimpleTargetAttr
- SymbolicUKernelProviderAttr
- TranslationInfoAttr
- UKernelDescriptorAttr
- XORShuffleAttr
- LocalMappingAttr
- WorkgroupMappingAttr
- WorkgroupScopeAttr
- Types
- Enums
- Operations
Operationslink
iree_codegen.dispatch_config (Codegen::DispatchConfigOp)link
Holds the workgroup count computation for a dispatch.
Syntax:
operation ::= `iree_codegen.dispatch_config` $function_ref
(`workgroup_size` `=` $workgroup_size^)?
(`subgroup_size` `=` $subgroup_size^)?
$body attr-dict
A module-level op that captures the workgroup count computation and
dispatch metadata for a function. The region computes the workgroup
count (x, y, z) from workload values passed as block arguments.
The terminator must yield exactly 3 index values, matching the
contract of hal.executable.export's workgroup count region.
The function_ref is a symbol reference to the corresponding
func.func. The op itself is not a Symbol because it lives
alongside the func.func of the same name in a builtin.module
symbol table, and two ops cannot define the same symbol.
Example:
iree_codegen.dispatch_config @matmul
workgroup_size = [64, 16, 1] subgroup_size = 64 {
^bb0(%w0: index, %w1: index, %w2: index, %w3: index):
%0 = affine.apply affine_map<()[s0] -> (s0 ceildiv 256)>()[%w2]
%c1 = arith.constant 1 : index
iree_codegen.yield %0, %c1, %c1 : index, index, index
}
Traits: IsolatedFromAbove, SingleBlockImplicitTerminator<YieldOp>, SingleBlock
Attributes:link
| Attribute | MLIR Type | Description |
|---|---|---|
function_ref | ::mlir::FlatSymbolRefAttr | flat symbol reference attribute |
workgroup_size | ::mlir::DenseI64ArrayAttr | i64 dense array attribute |
subgroup_size | ::mlir::IntegerAttr | 64-bit signless integer attribute |
iree_codegen.extract_strided_metadata (Codegen::ExtractStridedMetadataOp)link
Extracts a buffer base with offset and strides.
Syntax:
operation ::= `iree_codegen.extract_strided_metadata` $source `:` type($source) `->` type(results) attr-dict
This op is implemented similarly to the upstream MemRef::ExtractStridedMetadataOp with the following differences.
-
It does not fold away static offset/stride information. Hence unlike the upstream Op the link between the memref and consumers of the metadata is not broken when later passes change this information. A common example in IREE of this is buffer binding optimizations.
-
Helper functions getConstifiedMixed{Offset|Strides|Sizes} are not implemented as the expectation is you should lower to the upstream op before using those functions if you need them.
Copy of MemRef::ExtractStridedMetadataOp description for reference below. Extracts a base buffer, offset and strides. This op allows additional layers of transformations and foldings to be added as lowering progresses from higher-level dialect to lower-level dialects such as the LLVM dialect.
The op requires a strided memref source operand. If the source operand is not a strided memref, then verification fails.
This operation is also useful for completeness to the existing memref.dim op.
While accessing strides, offsets and the base pointer independently is not
available, this is useful for composing with its natural complement op:
memref.reinterpret_cast.
Intended Use Cases:
The main use case is to expose the logic for manipulate memref metadata at a higher level than the LLVM dialect. This makes lowering more progressive and brings the following benefits: - not all users of MLIR want to lower to LLVM and the information to e.g. lower to library calls---like libxsmm---or to SPIR-V was not available. - foldings and canonicalizations can happen at a higher level in MLIR: before this op existed, lowering to LLVM would create large amounts of LLVMIR. Even when LLVM does a good job at folding the low-level IR from a performance perspective, it is unnecessarily opaque and inefficient to send unkempt IR to LLVM.
Traits: AlwaysSpeculatableImplTrait, InferTypeOpAdaptor, SameVariadicResultSize
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface), OpAsmOpInterface, ViewLikeOpInterface
Effects: MemoryEffects::Effect{}
Operands:link
| Operand | Description |
|---|---|
source |
strided memref of any type values |
Results:link
| Result | Description |
|---|---|
base_buffer |
strided memref of any type values of rank 0 |
offset |
index |
sizes |
variadic of index |
strides |
variadic of index |
iree_codegen.fusion_barrier (Codegen::FusionBarrierOp)link
Prevents fusion through a tensor value
Syntax:
operation ::= `iree_codegen.fusion_barrier` attr-dict $source `:` type($result)
Traits: AlwaysSpeculatableImplTrait, SameOperandsAndResultType
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands:link
| Operand | Description |
|---|---|
source |
ranked tensor of any type values |
Results:link
| Result | Description |
|---|---|
result |
ranked tensor of any type values |
iree_codegen.index_hint (Codegen::IndexHintOp)link
Compiler hint providing semantic information about an index
Syntax:
operation ::= `iree_codegen.index_hint` $input `(` $hint `)` attr-dict `:` type($input)
Pure pass-through operation that annotates an index value with semantic information about how it varies across parallel workers (e.g., GPU lanes).
The hint attribute describes the index behavior. Common hints include:
- #iree_gpu.lane_constant<N>: Index is uniform within groups of N lanes
- #iree_gpu.lane_increment<N>: Index increments by 1 within groups of N lanes
This operation is always safe to remove (replace with input). It exists purely to guide optimization passes.
Example:
%row = iree_codegen.index_hint %idx(#iree_gpu.lane_constant<16>) : index
%col = iree_codegen.index_hint %idx(#iree_gpu.lane_increment<16>) : index
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes:link
| Attribute | MLIR Type | Description |
|---|---|---|
hint | ::mlir::Attribute | any attribute |
Operands:link
| Operand | Description |
|---|---|
input |
index |
Results:link
| Result | Description |
|---|---|
result |
index |
iree_codegen.inner_tiled (Codegen::InnerTiledOp)link
Generic operation on tiled operands based on an intrinsic.
Syntax:
operation ::= `iree_codegen.inner_tiled` `ins` `(` $inputs `)` `outs` `(` $outputs `)` attr-dict
`:` type($inputs) `into` type($outputs)
Each operand's shape is viewed as outer_dims x inner_tile:
- The outer dims are those described by the operand's indexing map; they
are iterated by an outer loop nest specified by iterator_types and
indexing_maps, with the same conventions as vector.contract.
- The remaining trailing dims form the operand's inner tile. At each
outer iteration point, the output inner tiles are updated from the
input inner tiles by the intrinsic named by kind.
Operands may be tensors, memrefs or vectors. Inner-tile dims are generally static, the only exception being scalable vectors dimensions which correspond to dynamic tensor dimensions.
Attributeslink
kind(InnerTileDescAttrInterface): names the intrinsic. It fixes the number of inputs/outputs and, together withsemantics, the canonical inner-tile vector type of each operand.semantics(InnerTiledSemanticsAttrInterface): controls how strictly operand inner tiles are checked againstkind. Differentsemanticsattributes from different dialects may adopt different conventions, but there are generally two main modes:opaque = true: only the element type and total element count of each inner tile must match the canonical vector type. Shape and rank are otherwise free. Scalable vectors and dynamic tensor dims are not allowed in this mode.- Default /
opaque = false: after dropping non-scalable unit dims, the operand inner-tile shape must match the (permuted) canonical shape one-to-one. Scalable vector dimensions are supported and must correspond to dynamic tensor dimensions. Every other tensor dimension must be static. indexing_maps: one projected-permutationAffineMapper operand (inputs first, then outputs), all sharing the same number of dims (equal toiterator_types.size()) and no symbols.iterator_types: one ofparallel/reductionfor each outer iterator dim.permutations(optional): If present, there is one entry per operand, each a permutation vectorpof length equal to that operand's inner rank, giving the relationship between the canonical inner-tile shapeC(fromkind/semantics) and the operand's actual inner-tile shapeS:S[i] == C[p[i]].
Exampleslink
Subgroup-level (undistributed) MMA on tensors using the AMD MFMA
intrinsic MFMA_F32_32x32x8_F16, whose canonical per-operand inner
tiles are LHS = 32x8 (M, K), RHS = 8x32 (K, N) and ACC = 32x32
(M, N). In practice the RHS is typically stored with its inner dims
swapped (the matmul is really an MMT at the tile level); this is
expressed by the [1, 0] entry in permutations on the RHS operand,
which makes its inner-tile shape canonical[1] x canonical[0] =
N x K = 32 x 8.
%0 = iree_codegen.inner_tiled ins(%lhs, %rhs) outs(%acc) {
indexing_maps = [
affine_map<(m, n, k) -> (m, k)>,
affine_map<(m, n, k) -> (n, k)>,
affine_map<(m, n, k) -> (m, n)>
],
iterator_types = [
#linalg.iterator_type<parallel>,
#linalg.iterator_type<parallel>,
#linalg.iterator_type<reduction>
],
kind = #iree_gpu.mma_layout<MFMA_F32_32x32x8_F16>,
semantics = #iree_gpu.mma_semantics<distributed = false, opaque = false>,
permutations = [array<i64: 0, 1>, array<i64: 1, 0>, array<i64: 0, 1>]
} : tensor<?x?x32x8xf16>, tensor<?x?x32x8xf16>
into tensor<?x?x32x32xf32>
After distribution across the 64 lanes of a wave, the per-operand
inner tiles become the canonical per-lane shares of the intrinsic:
256 / 64 = 4 f16 elements per lane for LHS and RHS, 1024 / 64 = 16
f32 elements per lane for ACC.
%0 = iree_codegen.inner_tiled ins(%lhs, %rhs) outs(%acc) {
indexing_maps = [
affine_map<(m, n, k) -> (m, k)>,
affine_map<(m, n, k) -> (n, k)>,
affine_map<(m, n, k) -> (m, n)>
],
iterator_types = [
#linalg.iterator_type<parallel>,
#linalg.iterator_type<parallel>,
#linalg.iterator_type<reduction>
],
kind = #iree_gpu.mma_layout<MFMA_F32_32x32x8_F16>,
semantics = #iree_gpu.mma_semantics<distributed = true, opaque = false>
} : tensor<?x?x4xf16>, tensor<?x?x4xf16>
into tensor<?x?x16xf32>
Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments, InferTypeOpAdaptor
Interfaces: ConditionallySpeculatable, DestinationStyleOpInterface, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface), TilingInterface, ValueBoundsOpInterface, VectorUnrollOpInterface
Effects: MemoryEffects::Effect{}
Attributes:link
| Attribute | MLIR Type | Description |
|---|---|---|
indexing_maps | ::mlir::ArrayAttr | indexing affine maps |
iterator_types | ::mlir::ArrayAttr | Iterator type should be an enum. |
kind | IREE::Codegen::InnerTileDescAttrInterface | buffer-like constant attribute values |
semantics | IREE::Codegen::InnerTiledSemanticsAttrInterface | Attribute implementing InnerTiledSemanticsAttrInterface |
permutations | ::mlir::ArrayAttr | permutations |
Operands:link
| Operand | Description |
|---|---|
inputs |
variadic of ranked tensor or vector of any type values |
outputs |
variadic of ranked tensor or vector of any type values |
Results:link
| Result | Description |
|---|---|
results |
variadic of ranked tensor or vector of any type values |
iree_codegen.load_from_buffer (Codegen::LoadFromBufferOp)link
Loads a tensor from a memref.
Syntax:
operation ::= `iree_codegen.load_from_buffer` $buffer attr-dict `:` type($buffer) `->` type($tensor)
Loads a tensor from a memref with a compatible shape and the same element type.
Interfaces: MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface
Operands:link
| Operand | Description |
|---|---|
buffer |
strided memref of any type values |
Results:link
| Result | Description |
|---|---|
tensor |
ranked tensor of any type values |
iree_codegen.null_pointer (Codegen::NullPointerOp)link
Returns a null_pointer value.
Syntax:
operation ::= `iree_codegen.null_pointer` attr-dict
This is meant to be used only as arguments to microkernels.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results:link
| Result | Description |
|---|---|
result |
Pseudo null-pointer type. Lowers to a null pointer. |
iree_codegen.query_tile_sizes (Codegen::QueryTileSizesOp)link
Yields tile sizes for the specified tensor type.
Syntax:
operation ::= `iree_codegen.query_tile_sizes` attr-dict $tensor_type `->` type($results)
For targets where tile sizes can't be resolved at compile time, this operation allows querying the sizes at runtime. Today this only applies to VMVX.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes:link
| Attribute | MLIR Type | Description |
|---|---|---|
tensor_type | ::mlir::TypeAttr | Tensor type attribute |
Results:link
| Result | Description |
|---|---|
results |
variadic of index |
iree_codegen.smt.assert (Codegen::AssertOp)link
Named SMT assertion for constraint verification.
Syntax:
operation ::= `iree_codegen.smt.assert` $condition `,` $msg (`,` $printArgs^)?
`:` type($condition) (`,` type($printArgs)^)? attr-dict
Asserts that a boolean SMT expression holds, with a human-readable
format string describing what is being checked. The format string may
contain {} placeholders that correspond positionally to the args
operands. An evaluator or verifier can substitute concrete values into
the placeholders for diagnostics.
Args are restricted to !smt.int because all constraint knobs and
dimension values are integers.
Used inside iree_codegen.smt.constraints regions.
Examples:
// Static message (no args):
iree_codegen.smt.assert %cond, "dim_0 == 128" : !smt.bool
// Format string with args:
iree_codegen.smt.assert %cond, "wg_x ({}) < wg_y ({})", %x, %y : !smt.bool, !smt.int, !smt.int
Traits: HasParent<ConstraintsOp>
Attributes:link
| Attribute | MLIR Type | Description |
|---|---|---|
msg | ::mlir::StringAttr | string attribute |
Operands:link
| Operand | Description |
|---|---|
condition |
|
printArgs |
variadic of |
iree_codegen.smt.constraints (Codegen::ConstraintsOp)link
SMT constraints for a codegen configuration of root ops.
Syntax:
operation ::= `iree_codegen.smt.constraints` `target` `=` $target `,` `pipeline` `=` custom<PipelineAttr>($pipeline) `,`
custom<KnobsDictionary>($knobs)
`dims` `(` $problem_dims `)` attr-dict-with-keyword
$body
Declares SMT constraints over problem dimensions and configuration knobs for a codegen pipeline, targeting a set of root ops.
target: A #iree_codegen.root_op<set = N> attribute identifying
which root op set these constraints apply to. All ops marked with
the same #iree_codegen.root_op<set = N> attribute share the same
lowering config. This decouples constraints from SSA values, so ops
with zero or multiple results are supported.
pipeline: The codegen pipeline to use. This is a fixed choice,
not decided by the solver.
knobs: DictionaryAttr mirroring GPULoweringConfigAttr. Leaves
that are #iree_codegen.smt.int_knob<"name"> attrs name tunable SMT
constants (materialized by iree_codegen.smt.knob ops in the body).
Integer/attr leaves are fixed.
problem_dims: index-typed problem dimensions; corresponding block
arguments are !smt.int.
Does not have any execution semantics and is meant to be used by the tuner or verification passes, and erased before lowering.
Example:
// The matmul is marked: {root_op = #iree_codegen.root_op<set = 0>}
iree_codegen.smt.constraints target = <set = 0>, pipeline = #iree_gpu.pipeline<VectorDistribute>,
knobs = {workgroup = [#iree_codegen.smt.int_knob<"wg_m">, #iree_codegen.smt.int_knob<"wg_n">]}
dims(%M, %N, %K) {
^bb0(%m: !smt.int, %n: !smt.int, %k: !smt.int):
%wg_m = iree_codegen.smt.knob "wg_m" : !smt.int
%wg_n = iree_codegen.smt.knob "wg_n" : !smt.int
}
Traits: IsolatedFromAbove, NoTerminator, SingleBlock
Attributes:link
| Attribute | MLIR Type | Description |
|---|---|---|
target | ::mlir::iree_compiler::IREE::Codegen::RootOpAttr | Marks an operation as a tuner root op in a given set |
pipeline | ::mlir::Attribute | identifier for pass pipeline use to lower dispatch region or PipelineAttrInterface instance |
knobs | ::mlir::DictionaryAttr | dictionary of named attribute values |
Operands:link
| Operand | Description |
|---|---|
problem_dims |
variadic of index |
iree_codegen.smt.knob (Codegen::KnobOp)link
Declare an SMT constant for a tunable configuration knob.
Syntax:
operation ::= `iree_codegen.smt.knob` $name attr-dict `:` type($result)
Materializes a named SMT constant (!smt.int) for use in constraint
expressions. The name must match an #iree_codegen.smt.int_knob<"name">
or #iree_codegen.smt.one_of_knob<"name", [...]> leaf in the
enclosing iree_codegen.smt.constraints op's knobs dictionary.
In SMT terminology this is a constant (0-ary function), not a variable. The tuner assigns concrete integer values to these constants.
Example:
%wg_m = iree_codegen.smt.knob "wg_m" : !smt.int
Traits: AlwaysSpeculatableImplTrait, HasParent<ConstraintsOp>
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes:link
| Attribute | MLIR Type | Description |
|---|---|---|
name | ::mlir::StringAttr | string attribute |
Results:link
| Result | Description |
|---|---|
result |
iree_codegen.smt.lookup (Codegen::LookupOp)link
Integer table lookup for SMT constraints.
Syntax:
operation ::= `iree_codegen.smt.lookup` $index $keys `->` $values attr-dict `:` type($result)
Maps an SMT integer index to an integer value via a sparse key-value table. Used to derive values from enumerated knobs (e.g., MMA shape dimensions from an mma_idx knob).
TODO: During constraint verification, the lookup will be evaluated
directly. During SMT-LIB export, it will be lowered to a chain of
smt.ite ops. Neither pass is implemented yet.
Example:
%mma_m = iree_codegen.smt.lookup %idx [3, 7, 12] -> [16, 32, 64] : !smt.int
Traits: AlwaysSpeculatableImplTrait, HasParent<ConstraintsOp>
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes:link
| Attribute | MLIR Type | Description |
|---|---|---|
keys | ::mlir::DenseI64ArrayAttr | i64 dense array attribute |
values | ::mlir::DenseI64ArrayAttr | i64 dense array attribute |
Operands:link
| Operand | Description |
|---|---|
index |
Results:link
| Result | Description |
|---|---|
result |
iree_codegen.store_to_buffer (Codegen::StoreToBufferOp)link
Stores a tensor into a memref.
Syntax:
operation ::= `iree_codegen.store_to_buffer` $tensor `,` $buffer
attr-dict `:` type($tensor) `into` type($buffer)
Stores a tensor into a memref with a compatible shape and the same element type.
Interfaces: MemoryEffectOpInterface
Operands:link
| Operand | Description |
|---|---|
tensor |
ranked tensor of any type values |
buffer |
strided memref of any type values |
iree_codegen.swizzle_hint (Codegen::SwizzleHintOp)link
Hint to swizzle accesses according to an access pattern.
Syntax:
operation ::= `iree_codegen.swizzle_hint` $operand `[` $swizzle attr-dict `]` `:` type($result)
Optimization hint to swizzle all accesses to the memref or tensor that this takes a view of. This only affects reads/writes that immediately consume this operation and is best effort. If the desired swizzling is not apparently possible, this op will no-op. As a result, it should not be relied on for correctness.
Any subviews on this operation will cause the swizzle application to fail. The expectation is for all view like operations to fold into the accessing ops (loads/stores) before this op takes effect.
Note that this only rewrites direct users. If there are any aliased loads or stores of the data from/to the |src| memref of a hintOp, those accesses will not be swizzled. This allows reusing an allocation with different swizzled access patterns as long as there is no data dependency between memory with different layouts. For example:
%0 = alloc()
%1 = iree_codegen.swizzle_hint %0, #layout_0
%2 = iree_codegen.swizzle_hint %0, #layout_1
{
vector.store %1
vector.load %1
^
|
unrelated
|
v
vector.store %2
vector.load %2
}
If there is a data dependency between the accesses of %1 and %2, for example a value stored to %1 is loaded from %2, this is undefined behavior. Aliasing is otherwise perfectly legal.
Traits: AlwaysSpeculatableImplTrait, SameOperandsAndResultType
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes:link
| Attribute | MLIR Type | Description |
|---|---|---|
swizzle | IREE::Codegen::SwizzleAttrInterface | swizzling descriptor attributes |
Operands:link
| Operand | Description |
|---|---|
operand |
ranked tensor or memref of any type |
Results:link
| Result | Description |
|---|---|
result |
ranked tensor or memref of any type |
iree_codegen.workgroup_count_hint (Codegen::WorkgroupCountHintOp)link
Hints at the workgroup count to set
Captures a set of values to use as the workgroup count. The backward slice starting from this op's operands is cloned into the workgroup count region of all transitive callers.
The sizes are specified in logical order (innermost to outermost), matching the workgroup count region's (x, y, z) convention. If fewer than 3 sizes are provided, the remaining dimensions default to 1.
If multiple hints inform the same entry point, the elementwise maximum across all hints is used as the count along each dimension. For example:
hal.executable.export @entry_point
module {
func.func @entry_point() {
iree_codegen.workgroup_count_hint sizes(%a, %b, %c)
iree_codegen.workgroup_count_hint sizes(%x, %y, %z)
}
}
resolves to:
hal.executable.export @entry_point {
%wx = arith.maxsi %a, %x
%wy = arith.maxsi %b, %y
%wz = arith.maxsi %c, %z
hal.return %wx, %wy, %wz
}
Common Usage: Linearized Workgroup Countslink
The most common use case for this operation involves computing a linearized (1-D) workgroup count and specifying it as a single size.
%num_workgroups = arith.ceildivui %total_iterations, %tile_size : index
iree_codegen.workgroup_count_hint sizes(%num_workgroups)
This results in (%num_workgroups, 1, 1) as the final workgroup count.
Attributes:link
| Attribute | MLIR Type | Description |
|---|---|---|
static_sizes | ::mlir::DenseI64ArrayAttr | i64 dense array attribute |
Operands:link
| Operand | Description |
|---|---|
sizes |
variadic of index |
iree_codegen.yield (Codegen::YieldOp)link
IREECodegen yield op.
Syntax:
operation ::= `iree_codegen.yield` attr-dict ($operands^ `:` type($operands))?
iree_codegen.yield is a special terminator operation for blocks inside
regions in iree_codegen ops.
Traits: AlwaysSpeculatableImplTrait, ReturnLike, Terminator
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface), RegionBranchTerminatorOpInterface
Effects: MemoryEffects::Effect{}
Operands:link
| Operand | Description |
|---|---|
operands |
variadic of any type |
Attributeslink
DenormalFpMathAttrlink
Denormal mode for fp math
Syntax:
#iree_codegen.denormal_fp_math<
::mlir::iree_compiler::IREE::Codegen::DenormalFpMath # value
>
Enum cases:
* none (None)
* preserve-sign (PreserveSign)
* positive-zero (PositiveZero)
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| value | ::mlir::iree_compiler::IREE::Codegen::DenormalFpMath |
an enum of type DenormalFpMath |
DispatchLoweringPassPipelineAttrlink
Identifier for pass pipeline use to lower dispatch region
Syntax:
#iree_codegen.<
::mlir::iree_compiler::IREE::Codegen::DispatchLoweringPassPipeline # value
>
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| value | ::mlir::iree_compiler::IREE::Codegen::DispatchLoweringPassPipeline |
an enum of type DispatchLoweringPassPipeline |
CompilationInfoAttrlink
Drive lowering of an operation from input dialect.
Syntax:
#iree_codegen.compilation_info<
LoweringConfigAttrInterface, # loweringConfig
TranslationInfoAttr # translationInfo
>
Specifies the information that allows controlling the compilation
of operations like linalg.matmul/linalg.*conv within
IREE. This information is used to override the defaults used by
the IREE compiler. If set on the input to the compiler, there is no
guarantee that the config survives until codegen. Named operations like
linalg.matmul/linalg.*conv* are more likely to retain their lowering
configurations.
TODO: It is expected that the TranslationInfoAttr and the
LoweringConfigAttr are specified. Currently there is no
verification that the values of the LoweringConfigAttr fully
specifies the behaviour of the compilation path chosen with
TranslationInfoAttr. This could be added in the future.
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| loweringConfig | LoweringConfigAttrInterface |
|
| translationInfo | TranslationInfoAttr |
ExportConfigAttrlink
User defined workgroup size specification.
Syntax:
#iree_codegen.export_config<
::llvm::ArrayRef<int64_t> # workgroup_size
>
Allows setting workgroup size for pre-formed dispatches.
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| workgroup_size | ::llvm::ArrayRef<int64_t> |
Workgroup Size to use |
IntKnobAttrlink
Integer-valued tunable knob placeholder.
Syntax:
#iree_codegen.smt.int_knob<
StringAttr # name
>
Represents a named placeholder for an integer tunable parameter in a
constraints knobs dictionary. During constraint generation, these appear
in tiling arrays (workgroup, reduction, thread), workgroup_size, and
subgroup_size positions. The name matches the corresponding
iree_codegen.smt.knob op name.
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| name | StringAttr |
LoweringConfigAttrlink
Drive lowering of an operation within dispatch region.
Syntax:
#iree_codegen.lowering_config<
LoweringConfigTilingLevelsAttr # tilingLevels
>
Default implementation of a lowering configuration attribute. It includes only tiling and optionally vectorization information. The interpretation of the tiles sizes are backend dependent.
TODO: Currently there is no verification that the configuration specifies everything needed for a pass-pipeline. The values to set for these parameters is dependent on the pass-pipeline implementation. In future, each pass pipeline could verify that the lowering configuration has all the necessary attributes for the pipeline.
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| tilingLevels | LoweringConfigTilingLevelsAttr |
The lowering config at different levels |
LoweringConfigTilingLevelAttrlink
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| sizes | ::llvm::ArrayRef<int64_t> |
The tile sizes to use for this level of tiling |
| interchange | ::llvm::ArrayRef<int64_t> |
The tile interchange to use for this level of tiling |
| scalableFlags | ::llvm::ArrayRef<bool> |
The scalable tile flags for this level of tiling |
LoweringConfigTilingLevelsAttrlink
Syntax:
#iree_codegen.lowering_config_levels<
::llvm::ArrayRef<LoweringConfigTilingLevelAttr> # value
>
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| value | ::llvm::ArrayRef<LoweringConfigTilingLevelAttr> |
OneOfKnobAttrlink
Select-from-enumerated-set knob placeholder.
Syntax:
#iree_codegen.smt.one_of_knob<
StringAttr, # name
ArrayAttr # options
>
Represents a named placeholder for a discrete choice from a fixed set of attributes. The SMT variable is an integer index into the options array. During extraction, the actual attribute is matched against the options to find its index. During materialization, the index selects directly from the array -- no reverse lookup needed.
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| name | StringAttr |
|
| options | ArrayAttr |
PassPipelineAttrlink
An attribute carrying a textual pass pipeline string.
Syntax:
#iree_codegen.pass_pipeline<
::llvm::StringRef # pipeline
>
Specifies a pass pipeline using MLIR's textual pass pipeline syntax.
The pipeline string is parsed and populated into an OpPassManager
when buildPipeline is called.
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| pipeline | ::llvm::StringRef |
The textual pass pipeline specification |
RootOpAttrlink
Marks an operation as a tuner root op in a given set
Syntax:
#iree_codegen.root_op<
int64_t # set
>
Attribute identifying a root operation. The set parameter groups root ops
into numbered sets: all root ops in the given set share the same
lowering_config. Codegen does not rely on the root_op attribute; it is
only used for constraint generation when tuning.
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| set | int64_t |
RotateRowsAttrlink
An attribute that describes a swizzling pattern for rotating rows.
Syntax:
#iree_codegen.rotate_rows<
int64_t, # row_width
int64_t # access_width
>
This attribute rotates accesses of |access_width| within rows of size
|row_width|. For any given access into logical memref of shape
memref<...xNx|access_width|x!eltype> where N = row_width / access_width
at position (i, j, 0) is rotated to (i, (i + j) % N, 0). For example,
row_width = 16, access_width = 4
0000 1111 2222 3333 /// 0 1 2 3
4444 5555 6666 7777 /// 0 1 2 3
8888 9999 AAAA BBBB /// 0 1 2 3
CCCC DDDD EEEE FFFF /// 0 1 2 3
is swizzled to
0000 1111 2222 3333 /// 0 1 2 3
7777 4444 5555 6666 /// 3 0 1 2
AAAA BBBB 8888 9999 /// 2 3 0 1
DDDD EEEE FFFF CCCC /// 1 2 3 0
The pattern repeats for subsequent rows.
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| row_width | int64_t |
|
| access_width | int64_t |
SimpleTargetAttrlink
Default implementation of TargetInfoAttrInterface
Syntax:
#iree_codegen.simple_target<
::llvm::ArrayRef<int64_t> # max_workgroup_count
>
Implements the basic information needed to satisfy the
TargetInfoAttrInterface
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| max_workgroup_count | ::llvm::ArrayRef<int64_t> |
Maximum allowed workgroup count in [x, y, z] |
SymbolicUKernelProviderAttrlink
An attribute that provides ukernel implementations by looking up the nearest symbol table from the annotation site.
Syntax: #iree_codegen.symbolic_ukernel_provider
TranslationInfoAttrlink
Drive dispatch entry point lowering.
Syntax:
#iree_codegen.translation_info<
Attribute, # passPipeline
SymbolRefAttr, # codegenSpec
::llvm::ArrayRef<int64_t>, # workgroupSize
int64_t, # subgroupSize
DictionaryAttr # configuration
>
Specifies the information that is used to drive the translation of
an entry point function using Linalg based structured-op
lowering. During executable translation this is attached to the
hal.executable.export operation.
If this operation is already set on the root operation (as part of
iree_codegen.compilation_info) that drives the compilation of a
dispatch region (like linalg.matmul/linalg.*conv*), this
attribute gets propagated to the entry point function.
The passPipeline field can be either:
- A DispatchLoweringPassPipelineAttr (enum keyword like CPUDefault).
- Any attribute implementing PipelineAttrInterface (e.g.,
#iree_codegen.pass_pipeline<"...">).
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| passPipeline | Attribute |
Pass pipeline specification. |
| codegenSpec | SymbolRefAttr |
The symbol pointing to the transform dialect codegen spec to be used |
| workgroupSize | ::llvm::ArrayRef<int64_t> |
The workgroup size to use |
| subgroupSize | int64_t |
The subgroup size to use |
| configuration | DictionaryAttr |
Pipeline specific configuration |
UKernelDescriptorAttrlink
An attribute that specifies the ukernel implementation based on the |ukernel_name| and the integration point based on the |kind|. Example integration points could for example be at the tensor, memref or bitcode level.
Syntax:
#iree_codegen.ukernel_descriptor<
::llvm::StringRef, # ukernel_name
::mlir::iree_compiler::IREE::Codegen::UKernelArgumentKind # kind
>
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| ukernel_name | ::llvm::StringRef |
|
| kind | ::mlir::iree_compiler::IREE::Codegen::UKernelArgumentKind |
an enum of type UKernelArgumentKind |
XORShuffleAttrlink
An attribute that describes an XOR-based swizzling pattern.
Syntax:
#iree_codegen.xor_shuffle<
int64_t, # row_width
int64_t, # access_width
int64_t, # row_stride
int64_t # per_phase
>
Shuffles accesses of |access_width| within rows of size
|row_width|. For any given access into logical memref of shape
memref<...xNx|access_width|x!eltype> where N = row_width / access_width
at position (i, j, 0) is shuffled to (i, ((i/per_phase) %N) XOR j , 0). For example,
row_width = 16, access_width = 4, per_phase = 1
0000 1111 2222 3333 /// 0 1 2 3
4444 5555 6666 7777 /// 0 1 2 3
8888 9999 AAAA BBBB /// 0 1 2 3
CCCC DDDD EEEE FFFF /// 0 1 2 3
is swizzled to
0000 1111 2222 3333 /// 0 1 2 3
5555 4444 7777 6666 /// 1 0 3 2
AAAA BBBB 8888 9999 /// 2 3 0 1
FFFF EEEE DDDD CCCC /// 3 2 1 0
row_width = 16, access_width = 4, per_phase = 2
0000 1111 2222 3333 /// 0 1 2 3
4444 5555 6666 7777 /// 0 1 2 3
8888 9999 AAAA BBBB /// 0 1 2 3
CCCC DDDD EEEE FFFF /// 0 1 2 3
is swizzled to
0000 1111 2222 3333 /// 0 1 2 3
4444 5555 6666 7777 /// 0 1 2 3
9999 8888 BBBB AAAA /// 1 0 3 2
DDDD CCCC FFFF EEEE /// 1 0 3 2
The pattern repeats for subsequent rows.
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| row_width | int64_t |
|
| access_width | int64_t |
|
| row_stride | int64_t |
row stride. Default to row_width |
| per_phase | int64_t |
Default to 1 |
LocalMappingAttrlink
Syntax:
#iree_codegen.local_mapping<
int64_t # dim
>
Attribute for mapping scf.forall loops to sequential local execution.
Example:
scf.forall (%i, %j) in (4, 8) {
// body ... %i, %j
} {mapping = [#iree_codegen.local_mapping<1>,
#iree_codegen.local_mapping<0>]}
This forall approximately corresponds to the following loop nest:
scf.for %i = 0 to 4 {
scf.for %j = 0 to 8 {
// body ... %i, %j
}
}
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| dim | int64_t |
WorkgroupMappingAttrlink
Syntax:
#iree_codegen.workgroup_mapping<
::mlir::iree_compiler::IREE::Codegen::WorkgroupId, # id
int64_t # delinearizedDim
>
Attribute that eventually will be used to map distributed loop iterations
to hal.workgroup.ids.
The x,y and z values for id map to hal.workgroup.id[0],
hal.workgroup.id[1] and hal.workgroup.id[2] respectively.
In addition it is possible to specify if the z dimension is to be
delinearized on mapping. For example if the list of mapping attributes is
[workgroup_mapping<z:1>, workgroup_mapping<z:0>], then the z dimension
is delinearized to map to workgroup_mapping<z:1> and
workgroup_mapping<z:0>. In other words if the number of logical parallel
workers along the z:0 dimension is W, then
workgroup_mapping<z:0> = hal.workgroup.id[1] mod W,
worgrkoup_mapping<z:1> = hal.workgroup.id[1] div W
Note: It is expected that this attribute is always used in a list of mapping
attributes (with a single element being a list of size 1). It is illegal for
a list to have workgroup_mapping<z:a> without workgroup_mapping<z:b>
if a > b. In the same way it is illegal to for the list to
- have workgroup_mapping<y> but not workgroup_mapping<x>
- have workgroup_mapping<z:*> but not have workgroup_mapping<x>
and workgroup_mapping<y>
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| id | ::mlir::iree_compiler::IREE::Codegen::WorkgroupId |
an enum of type WorkgroupId |
| delinearizedDim | int64_t |
WorkgroupScopeAttrlink
Attribute representing parallel execution across workgroups.
Parallel scope that maps to workgroup execution.
If |linearize| is set to true, workgroup ids are linearized into a single id which is then typically delinearized based on the problem space.
Parameters:link
| Parameter | C++ type | Description |
|---|---|---|
| linearize | bool |
Typeslink
NullPointerTypelink
Pseudo null-pointer type. Lowers to a null pointer.
Syntax: !iree_codegen.null_pointer
This is meant to be used only as arguments to microkernels.
Enumslink
BinaryFnlink
Allowed 32-bit signless integer cases: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Cases:link
| Symbol | Value | String |
|---|---|---|
| add | 0 |
add |
| sub | 1 |
sub |
| mul | 2 |
mul |
| div | 3 |
div |
| div_unsigned | 4 |
div_unsigned |
| max_signed | 5 |
max_signed |
| min_signed | 6 |
min_signed |
| max_unsigned | 7 |
max_unsigned |
| min_unsigned | 8 |
min_unsigned |
| powf | 9 |
powf |
DenormalFpMathlink
Denormal mode for fp math
Cases:link
| Symbol | Value | String |
|---|---|---|
| None | 0 |
none |
| PreserveSign | 1 |
preserve-sign |
| PositiveZero | 2 |
positive-zero |
DispatchLoweringPassPipelinelink
Identifier for pass pipeline use to lower dispatch region
Cases:link
| Symbol | Value | String |
|---|---|---|
| VMVXDefault | 300 |
VMVXDefault |
| TransformDialectCodegen | 1000 |
TransformDialectCodegen |
| Custom | 1001 |
Custom |
| None | 65535 |
None |
ElementwiseArityGrouplink
Allowed 32-bit signless integer cases: 1, 2, 3
Cases:link
| Symbol | Value | String |
|---|---|---|
| Unary | 1 |
Unary |
| Binary | 2 |
Binary |
| Ternary | 3 |
Ternary |
ElementwiseCaseLimitslink
Allowed 32-bit signless integer cases:
Cases:link
| Symbol | Value | String |
|---|---|---|
| LastUnary | 13 |
LastUnary |
| LastBinary | 23 |
LastBinary |
| LastTernary | 24 |
LastTernary |
ElementwiseKindlink
Allowed 32-bit signless integer cases: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23
Cases:link
| Symbol | Value | String |
|---|---|---|
| exp | 0 |
exp |
| log | 1 |
log |
| abs | 2 |
abs |
| ceil | 3 |
ceil |
| floor | 4 |
floor |
| negf | 5 |
negf |
| reciprocal | 6 |
reciprocal |
| round | 7 |
round |
| sqrt | 8 |
sqrt |
| rsqrt | 9 |
rsqrt |
| square | 10 |
square |
| tanh | 11 |
tanh |
| erf | 12 |
erf |
| add | 13 |
add |
| sub | 14 |
sub |
| mul | 15 |
mul |
| div | 16 |
div |
| div_unsigned | 17 |
div_unsigned |
| max_signed | 18 |
max_signed |
| min_signed | 19 |
min_signed |
| max_unsigned | 20 |
max_unsigned |
| min_unsigned | 21 |
min_unsigned |
| powf | 22 |
powf |
| select | 23 |
select |
IteratorTypelink
Iterator type
Cases:link
| Symbol | Value | String |
|---|---|---|
| parallel | 0 |
parallel |
| reduction | 1 |
reduction |
TernaryFnlink
Allowed 32-bit signless integer cases: 0
Cases:link
| Symbol | Value | String |
|---|---|---|
| select | 0 |
select |
TypeFnlink
Allowed 32-bit signless integer cases: 0, 1
Cases:link
| Symbol | Value | String |
|---|---|---|
| cast_signed | 0 |
cast_signed |
| cast_unsigned | 1 |
cast_unsigned |
UKernelArgumentKindlink
Attribute describing the ukernel integration point
Cases:link
| Symbol | Value | String |
|---|---|---|
| Tensor | 0 |
tensor |
| Memref | 1 |
memref |
| Bitcode | 2 |
bitcode |
UnaryFnlink
Allowed 32-bit signless integer cases: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
Cases:link
| Symbol | Value | String |
|---|---|---|
| exp | 0 |
exp |
| log | 1 |
log |
| abs | 2 |
abs |
| ceil | 3 |
ceil |
| floor | 4 |
floor |
| negf | 5 |
negf |
| reciprocal | 6 |
reciprocal |
| round | 7 |
round |
| sqrt | 8 |
sqrt |
| rsqrt | 9 |
rsqrt |
| square | 10 |
square |
| tanh | 11 |
tanh |
| erf | 12 |
erf |
WinogradConv2DFmrlink
Allowed 32-bit signless integer cases: 0, 1, 2
Cases:link
| Symbol | Value | String |
|---|---|---|
| F_2_3 | 0 |
F_2_3 |
| F_4_3 | 1 |
F_4_3 |
| F_2_5 | 2 |
F_2_5 |
WorkgroupIdlink
Attribute that map to hal.workgrpoup.ids
Cases:link
| Symbol | Value | String |
|---|---|---|
| IdX | 0 |
x |
| IdY | 1 |
y |
| IdZ | 2 |
z |