'iree_linalg_ext' Dialectlink

IREE Linalg Extensions.

A dialect designed for experimenting with non-structured operations that cannot be represented efficiently/directly by the Linalg dialect.

'iree_linalg_ext' Dialect

Operationslink

`iree_linalg_ext.custom_op` (LinalgExt::CustomOp)link

Custom operation for compiling with IREE.

Syntax:

operation ::= `iree_linalg_ext.custom_op` `{` `indexing_maps` `=` $indexing_maps `,`
              `iterator_types` `=` $iterator_types `}`
              attr-dict-with-keyword
              (`ins` `(` $inputs^ `:` type($inputs) `)`)?
              (`outs` `(` $outputs^ `:` type($outputs) `)`)?
              $region (`->` type($results)^)?

This operation is meant to allow computation sequences that are fused at tile level prescriptively. This is to account for cases where such fusion cannot/is not yet discovered appropriately.

The operation implements all the interfaces needed to be able to 1. Compile e2e using IREE 2. Still be able to fuse with other operations that the compiler can figure out automatically.

Similar to how LinalgOps represent a perfectly nested loop computation with - indexing_maps representing how the ins/outs are accessed - region representing the scalar computation performed - iterator_types representing the dependence along each iteration space dimension this operation represent a tiled computation with perfectly nested inter-tile loop nest. - indexing_maps represent what slices slices of the ins/outs are needed for each iteration of the tiled computation. - region represents the tiled computation performed using these slices - iterator_types represents the dependence between tiles along each iteration space.

Some modifications required to handle the tile-level semantics are - Some dimensions of operands might not be accessed by dimensions of the inter-tile iteration space. This means that along these dimensions the slice size matches the dimension size. This access pattern of operands is captured in the respective indexing map using a symbol to represent that the entire dimension needs to be sliced. - The basic block arguments of the region represent the slice of the operand. These are either scalar types (if the corresponding operand is a scalar), or a tensor type with dynamic shapes (if the corresponding operand is a tensor type).

For example, one could represent a prescriptively fused matmul computation as follows

%0:2 = iree_linalg_ext.custom_op {
    indexing_maps = [affine_map<(d0, d1)[s0, s1] -> (d0, s0)>,
                     affine_map<(d0, d1)[s0, s1] -> (s0, s1)>,
                     affine_map<(d0, d1)[s0, s1] -> (s1, d1)>,
                     affine_map<(d0, d1)[s0, s1] -> (d0, s1)>,
                     affine_map<(d0, d1)[s0, s1] -> (d0, d1)],
    iterator_types = ["parallel", "parallel"]}
    ins(%lhs1, %rhs1, %rhs2
        : tensor<1000000x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>)
    outs(%outs1, %outs2 : tensor<1000000x?xf32>, tensor<1000000x?xf32>) {
  ^bb0(%t0 : tensor<?x?xf32>, %t1 : tensor<?x?xf32>, %t2 : tensor<?x?xf32>,
       %t3 : tensor<?x?xf32>, %t4 : tensor<?x?xf32>) :
    %0 = linalg.matmul ins(%t0, %t1 : tensor<?x?xf32>, tensor<?x?xf32>)
        outs(%t3 : tensor<?x?xf32>) -> tensor<?x?xf32>
    %1 = linalg.matmul ins(%0, %t2 : tensor<?x?xf32>, tensor<?x?xf32>)
        outs(%t4 : tensor<?x?xf32>) -> tensor<?x?xf32>
    iree_linalg_ext.yield %0, %1 : tensor<?x?xf32>, tensor<?x?xf32>
} -> tensor<1000000x?xf32>, tensor<x?xf32>

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: AggregatedOpInterface, DestinationStyleOpInterface, LinalgExtInterface, LinalgFusionOpInterface, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`indexing_maps`	::mlir::ArrayAttr	AffineMap array attribute
`iterator_types`	::mlir::ArrayAttr	LinalgExt iterator type

Operands:link

Operand	Description
`inputs`	variadic of ranked tensor of signless integer or index or floating-point values or signless integer or index or floating-point
`outputs`	variadic of ranked tensor of any type values

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

Data tiling opslink

Operations for working with data layouts, padding, encodings, and other properties useful for tiling computations across iteration space dimensions.

`iree_linalg_ext.pack` (LinalgExt::PackOp)link

LinalgExt pack operation for both tensors and buffers.

Syntax:

operation ::= `iree_linalg_ext.pack` attr-dict
              $inputs
              (`padding_value` `(` $padding_value^ `:` type($padding_value) `)`)?
              (`outer_dims_perm` `=` $outer_dims_perm^)?
              `inner_dims_pos` `=` $inner_dims_pos
              `inner_tiles` `=`
              custom<DynamicIndexList>($inner_tiles, $static_inner_tiles)
              `into` $outputs `:` `(` type($inputs) type($outputs) `)`
              (`->` type($results)^)?

The pack operation converts an input into a tiled and packed layout. The dimensions to be tiled are obtained from inner_dims_pos and the size of the tile is obtained from inner_tiles. The dimensions listed in inner_dims_pos do not need to be contiguous in which case the tile will get transposed. We handle only full tiles if padding_value is not set; it is UB if the tile does not perfectly divide the dimension. If padding_value is set, it will pad along high dimensions, i.e., it pads at the bottom and on the right if the input has rank 2, and the result type shape, will be dynamic in any dimension if and only if the input shape is. As optional input, the operation takes outer_dims_perm that allows to permute the tiled loops.

Example KC_to_KCck:

iree_linalg_ext.pack %arg0 inner_dims_pos = [1, 0]
  inner_tiles = [32, 8] into %arg1 : (memref<128x256xf32> memref<16x8x32x8xf32>)

Example NC_to_NCnc:

iree_linalg_ext.pack %arg0 inner_dims_pos = [0, 1]
  inner_tiles = [8, 32] into %arg1 : (memref<128x256xf32> memref<16x8x8x32xf32>)

Example KC_to_CKkc

iree_linalg_ext.pack %arg0 outer_dims_perm = [1, 0] inner_dims_pos = [0, 1]
  inner_tiles = [32, 8] into %arg1 : (memref<128x256xf32> memref<32x4x32x8xf32>)

In all cases, dimension at position 0 in the input memref (128) is tiled with a factor of 8, while dimension at position 1 (256) is tiled with a factor of 32. In the KC_to_KCck example, the point loops are interchanged, while in the KC_to_CKkc example the tiled loops.

Example NC_to_NCnc with padding:

iree_linalg_ext.pack %arg padding_value(%pad : f32) inner_dims_pos = [0, 1]
  inner_tiles = [8, 2] into %arg1 : (memref<13x15xf32> memref<2x8x8x2xf32>)

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, LinalgExtOp, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`outer_dims_perm`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`inner_dims_pos`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`static_inner_tiles`	::mlir::DenseI64ArrayAttr	i64 dense array attribute

Operands:link

Operand	Description
`inputs`	variadic of shaped of any type values
`outputs`	variadic of shaped of any type values
`inner_tiles`	variadic of index
`padding_value`	any type

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

`iree_linalg_ext.unpack` (LinalgExt::UnPackOp)link

LinalgExt unpack operation for both tensors and buffers.

Syntax:

operation ::= `iree_linalg_ext.unpack` attr-dict
              $inputs
              (`outer_dims_perm` `=` $outer_dims_perm^)?
              `inner_dims_pos` `=` $inner_dims_pos
              `inner_tiles` `=`
              custom<DynamicIndexList>($inner_tiles, $static_inner_tiles)
              `into` $outputs `:` `(` type($inputs) type($outputs) `)`
              (`->` type($results)^)?

The unpack operation converts a tiled and packed input to an unpacked output. See pack for more details on inner_tiles and dims_pos; it is UB if the tile does not perfectly divide the dimension. Optionally, the operation also supports permuting the tiled loops.

Example KCck_to_KC:

iree_linalg_ext.unpack %arg0 dims_pos = [1, 0]
  inner_tiles = [32, 8] into %arg1 : (memref<16x8x32x8xf32> memref<128x256xf32>)

Example NCnc_to_NC:

iree_linalg_ext.unpack %arg0 dims_pos = [0, 1]
  inner_tiles = [8, 32] into %arg1 : (memref<16x8x8x32xf32> memref<128x256xf32>)

Example CKkc_to_KC:

iree_linalg_ext.unpack %arg1 outer_dims_perm = [1, 0] inner_dims_pos = [0, 1]
  inner_tiles = [32, 8] into %arg0 : (memref<32x4x32x8xf32> memref<128x256xf32>)

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, LinalgExtOp, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`outer_dims_perm`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`inner_dims_pos`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`static_inner_tiles`	::mlir::DenseI64ArrayAttr	i64 dense array attribute

Operands:link

Operand	Description
`inputs`	variadic of shaped of any type values
`outputs`	variadic of shaped of any type values
`inner_tiles`	variadic of index

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

Non-structured opslink

`iree_linalg_ext.arg_compare` (LinalgExt::ArgCompareOp)link

Performs an arg-reduction using a user-defined comparator.

Syntax:

operation ::= `iree_linalg_ext.arg_compare` attr-dict
              `dimension` `(` $dimension `)`
              (`ins` `(` $inputs^ `:` type($inputs) `)`)?
              `outs` `(` $outputs `:` type($outputs) `)`
              (`index_base` `(` $index_base^ `:` type($index_base) `)`)?
              $region (`->` type($results)^)?

The arg_compare op performs a reduction over a given dimension of a tensor, returning both the selected value and its corresponding index. The selection logic is defined by a user-specified comparator region.

The comparator region receives two candidate values and returns a single i1 result indicating whether the first argument should be preferred over the second.

This region defines the sorting rule, e.g., "greater than" for argmax or "less than" for argmin. It allows for generalization beyond simple argmax-style behavior.

Example (argmax over dim 1):

%input = memref<2x10xf32>
%out_val = memref<2xf32>
%out_idx = memref<2xi32>
iree_linalg_ext.arg_compare
  dimension(1)
  ins(%input : memref<2x10xf32>)
  outs(%out_val, %out_idx : memref<2xf32>, memref<2xi32>) {
^bb0(%a: f32, %b: f32):
  %cmp = arith.cmpf ogt, %a, %b : f32
  iree_linalg_ext.yield %cmp : i1
}

Example with index_base = 5 (i.e., indices start counting from 5):

%input = memref<2x10xf32>
%out_val = memref<2xf32>
%out_idx = memref<2xi32>
%base = arith.constant 5 : index
iree_linalg_ext.arg_compare
  dimension(1)
  ins(%input : memref<2x10xf32>)
  outs(%out_val, %out_idx : memref<2xf32>, memref<2xi32>)
  index_base(%base : index) {
^bb0(%a: f32, %b: f32):
  %cmp = arith.cmpf ogt, %a, %b : f32
  iree_linalg_ext.yield %cmp : i1
}

The index_base is optional. When specified, it is added to the selected index in the result, which is useful when reducing over a tiled or sliced subregion.

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, LinalgExtOp, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`dimension`	::mlir::IntegerAttr	64-bit signless integer attribute

Operands:link

Operand	Description
`inputs`	variadic of shaped of any type values
`outputs`	variadic of shaped of any type values
`index_base`	index

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

`iree_linalg_ext.attention` (LinalgExt::AttentionOp)link

Attention operator.

Syntax:

operation ::= `iree_linalg_ext.attention` attr-dict
              `ins` `(` $query `,` $key `,` $value `,` $scale (`,` $mask^)?  `:` type($query) `,` type($key) `,` type($value) `,` type($scale) (`,` type($mask)^ )?`)`
              `outs` `(` $output `:` type($output) `)`
              $region
              (`->` type($results)^)?

Computes the scaled dot product attention function:

attention(Q, K, V, scale) = softmax(Q @ K.T * scale) @ V

Here Q, K, V are given tensors and scale is a scalar value specifying the scale to use.

If an additional mask argument M is included, the result of the first matmul is modified according to:

Q @ K.T += M

Traits: SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: AggregatedOpInterface, DestinationStyleOpInterface, IndexingMapOpInterface, LinalgExtInterface, LinalgFusionOpInterface, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`indexing_maps`	::mlir::ArrayAttr	AffineMap array attribute
`decomposition_config`	::mlir::DictionaryAttr	dictionary of named attribute values

Operands:link

Operand	Description
`query`	shaped of any type values
`key`	shaped of any type values
`value`	shaped of any type values
`scale`	floating-point
`mask`	shaped of any type values
`output`	shaped of any type values

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

`iree_linalg_ext.fft` (LinalgExt::FftOp)link

Fft operator.

Syntax:

operation ::= `iree_linalg_ext.fft` attr-dict (`ins` `(` $inputs^ `:` type($inputs) `)`)?
              `outs` `(` $outputs `:` type($outputs) `)`
              (`:` type($results)^)?

Apply 1D FFT to innermost dim. This is an iterative FFT, not recurrsive. Thus, the bit reversal is assumed applied on the input. The op carries an input -- stage, which indicates the level of reduction loop in the algorithm. It represents the computation body. For more details, see "Data reordering, bit reversal, and in-place algorithms" section in https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm

The size of innermost dim is expected to be a power of 2.

It is optional to carry coefficient tensors/buffers as inputs. In this context, they will be the second and third inputs.

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Operands:link

Operand	Description
`inputs`	variadic of any type
`outputs`	variadic of shaped of any type values

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

`iree_linalg_ext.gather` (LinalgExt::GatherOp)link

Gathers slices from a source based on a tensor of indices.

Syntax:

operation ::= `iree_linalg_ext.gather` attr-dict `dimension_map` `=` $dimension_map
              (`ins` `(` $inputs^ `:` type($inputs) `)`)?
              `outs` `(` $outputs `:` type($outputs) `)`
              (`->` type($results)^)?

Takes two inputs (source and indices) and outputs value (output). The operation returns the value at the slices specified by indices.

The size of the dimension_map attribute is used to determine how many indices are used to index into source, i.e. index_depth. The dimension_map attribute describes which index value maps to which dimension in the destination.

This operation preforms the opposite operation of iree_linalg_ext.scatter. Instead of scattering updates into original, it gathers the values from source into output using the indices in indices. See the documentation on iree_linalg_ext.scatter for more details regarding the indexing/shape semantics.

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, LinalgFusionOpInterface, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`dimension_map`	::mlir::DenseI64ArrayAttr	i64 dense array attribute

Operands:link

Operand	Description
`inputs`	variadic of ranked tensor or memref of any type values
`outputs`	variadic of ranked tensor or memref of any type values

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

`iree_linalg_ext.im2col` (LinalgExt::Im2colOp)link

Im2col operation for convolutions.

Syntax:

operation ::= `iree_linalg_ext.im2col` attr-dict
              `strides` `=` $strides
              `dilations` `=` $dilations
              `kernel_size` `=`
              custom<DynamicIndexList>($kernel_size, $static_kernel_size)
              `m_offset` `=`
              custom<DynamicIndexList>($m_offset, $static_m_offset)
              `*` custom<DynamicIndexList>($m_strides, $static_m_strides)
              `k_offset` `=`
              custom<DynamicIndexList>($k_offset, $static_k_offset)
              `*` custom<DynamicIndexList>($k_strides, $static_k_strides)
              `batch_pos` `=` $batch_pos
              `m_pos` `=` $m_pos
              `k_pos` `=` $k_pos
              `input_k_perm` `=` $input_k_perm
              `ins` `(` $input `:` type($input) `)`
              `outs` `(` $output `:` type($output) `)`
              (`->` type($results)^)?

Im2col op for convolutions. The operation performs a transformation on the input to convert it from a convolution input to an equivalent gemm input. The op is defined by its input, output, some conv metadata, and some indexing metadata. The strides, dilations, and kernel_size are taken from the convolution from which this op is generated, and they define how the input operand is indexed when the operation is decomposed. The shape of the output should be tensor<BxMxK>, and the m_pos, k_pos, and batch_pos indicate which input dimensions map to which output dimensions.

The k_offset is an offset within the output K dimension from which the iteration space of the operation begins. This is used for tiling, since the tiled implementation must leave the output K dimension untiled. Similarly, m_offset is the offset within the output M dimension from which the iteration space of the operation begins. The iteration space is the full output shape of the im2col op, so if the im2col op were tiled to loops with a scalar inner tile, it would look like the following:

  %im2col = iree_linalg_ext.im2col
      strides = [1, 1] dilations = [1, 1] kernel_size = [3, 3]
      m_offset = [0] * [1] k_offset = [0] * [1]
      batch_pos = [0] m_pos = [1, 2] k_pos = [3]
      ins(%in : tensor<2x34x34x640xf32>)
      outs(%out : tensor<2x1024x5760xf32>) -> tensor<2x1024x5760xf32>

becomes:

  scf.for %arg0 = %c0 to %c2 step %c1
    scf.for %arg1 = %c0 to %c1024 step %c1
      scf.for %arg2 = %c0 to %c5760 step %c1
        %im2col = iree_linalg_ext.im2col
            strides = [1, 1] dilations = [1, 1] kernel_size = [3, 3]
            m_offset = [%arg1] * [1] k_offset = [%arg2] * [1]
            batch_pos = [0] m_pos = [1, 2] k_pos = [3]
            ins(%in_tile : tensor<1x34x34x640xf32>)
            outs(%out_tile : tensor<1x1x1xf32>) -> tensor<1x1x1xf32>

Then, when the tiled op is decomposed, it becomes a loop over the iteration space of the im2col op, whith an extract_slice from the %in_tile followed by an insert_slice to the %out_tile. The indices for the extract slice are computed using the m_offset and k_offset as: (b, m, k) -> (b, M / 32 + K / (6403), M % 32 + K % (6403) / 640, K % 640) Where (b, m, k) are the indices of the tiled op's iteration space, and M = m + m_offset and K = k + K_offset.

The m_strides and k_strides fields are used as a basis for linearizing the m_offset and k_offset. This is used when there are multiple M or K output dimensions, and therefore multiple m_offset or k_offset values. The strides fields are assembled in the IR as if they are multiplied as an inner product with m_offset and k_offset, indicating that the total linear offset along the dimension is equal to this inner product. These strides fields also determine the strides of the output dimensions along M and K. For example, an op withm_strides = [32, 1],k_strides = [4, 1], and output typetensor(expanded fromtensor), would have strides along the M dim of 32 forM0, meaning asM0increases by 1, the index into the flatMincreases by 32. Along the K dim, strides would be 4 forK0, and 1 forK1, meaning asK0increases by 1, the index into the flatKincreases by 4. The strides in M fromm_stridesare orthogonal to the strides inKfromk_strides`.

The input_k_perm attribute defines the permutation needed to align the reduction dimensions of the input layout with those of the filter layout when computing the K dimension of the im2col output. This is useful when the layout of the filter (e.g., CHW) differs from that of the input (e.g., HWC). For instance, an input_k_perm = [2, 0, 1] indicates the input indices needs to be transposed from HWC to CHW layout before extracting slices during decomposition. The identity permutation (e.g., input_k_perm = [0, 1, 2]) indicates that the input layout is already aligned with the filter layout in terms of reduction dimensions, so no transposition of indices is necessary before slice extraction.

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: AggregatedOpInterface, DestinationStyleOpInterface, LinalgExtInterface, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`strides`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`dilations`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`static_kernel_size`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`static_m_offset`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`static_m_strides`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`static_k_offset`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`static_k_strides`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`batch_pos`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`m_pos`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`k_pos`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`input_k_perm`	::mlir::DenseI64ArrayAttr	i64 dense array attribute

Operands:link

Operand	Description
`input`	shaped of any type values
`output`	shaped of any type values
`kernel_size`	variadic of index
`m_offset`	variadic of index
`m_strides`	variadic of index
`k_offset`	variadic of index
`k_strides`	variadic of index

Results:link

Result	Description
`results`	variadic of shaped of any type values

`iree_linalg_ext.map_scatter` (LinalgExt::MapScatterOp)link

Scatter with a mapping from source indices to result indices.

Syntax:

operation ::= `iree_linalg_ext.map_scatter` attr-dict $input `into` $output
              $transformation_region
              `:` type($input) `into` type($output) (`->` type($results)^)?

Takes two operands, input and output, and stores every element of input to a unique location in output. If the operands are tensors, the op will also return a Value for the result. For an element of the input, the index to store into the output is determined by index computation mapping the index in the input value to an index in the output value. This computation is contained in the transformation_region, which contains a single block, with one argument for each dimension in the input value. The block arguments represent the index of a given element along the corresponding dimension (i.e., block arg i represents the index along dimension i of the input value). The block is terminated by an iree_linalg_ext.yield op, which yields one index for each dimension in the output, and an additional i1 Value, which represents a mask on whether or not to write to the output at the yielded set of indices. Iff the mask value is true, the input value will be written.

Traits: SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, MemoryEffectOpInterface, TilingInterface

Operands:link

Operand	Description
`input`	shaped of any type values
`output`	ranked tensor or memref of any type values

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

`iree_linalg_ext.online_attention` (LinalgExt::OnlineAttentionOp)link

Online Attention operator.

Syntax:

operation ::= `iree_linalg_ext.online_attention` attr-dict
              `ins` `(` $query `,` $key `,` $value `,` $scale (`,` $mask^)?  `:` type($query) `,` type($key) `,` type($value) `,` type($scale) (`,` type($mask)^ )?`)`
              `outs` `(` $output `,` $max `,` $sum `:` type($output) `,` type($max) `,` type($sum) `)`
              $region
              (`->` type($results)^)?

Traditional scaled dot product attention computes:

attention(Q, K, V, scale) = softmax(Q @ K.T * scale) @ V

Online Attention on the other hand, uses an online normalizer instead of softmax:

online_attention(Q, K, V, scale, running_max, running_sum) = online_normalizer(Q @ K.T * scale, running_max, running_sum) @ V

If an additional mask argument M is included, the result of the first matmul is modified according to:

Q @ K.T += M

The advantage of this online_normalizer is that it can be tiled along its reduction dimension, making the online_attention operator: - Tilable along softmax reduction dimension - Associative along softmax reduction dimension - Commutative along softmax associative dimension

Note: The results of online_attention need to be combined after computing it over the entire softmax reduction dimension by: x, _, sum : results x = (1 / sum) * x

Traits: SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: AggregatedOpInterface, DestinationStyleOpInterface, IndexingMapOpInterface, LinalgExtInterface, MemoryEffectOpInterface, PartialReductionOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`indexing_maps`	::mlir::ArrayAttr	AffineMap array attribute
`decomposition_config`	::mlir::DictionaryAttr	dictionary of named attribute values

Operands:link

Operand	Description
`query`	shaped of any type values
`key`	shaped of any type values
`value`	shaped of any type values
`scale`	floating-point
`mask`	shaped of any type values
`output`	shaped of any type values
`max`	shaped of any type values
`sum`	shaped of any type values

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

`iree_linalg_ext.scan` (LinalgExt::ScanOp)link

Scan operator.

Syntax:

operation ::= `iree_linalg_ext.scan` attr-dict
              `dimension` `(` $dimension `)`
              `inclusive` `(` $inclusive `)`
              `ins` `(` $inputs `:` type($inputs) `)`
              `outs` `(` $outputs `:` type($outputs) `)`
              $region (`->` type($results)^)?

Computes the inclusive/exclusive scan along a given dimension.

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`dimension`	::mlir::IntegerAttr	64-bit signless integer attribute
`inclusive`	::mlir::BoolAttr	bool attribute

Operands:link

Operand	Description
`inputs`	variadic of shaped of any type values
`outputs`	variadic of shaped of any type values

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

`iree_linalg_ext.scatter` (LinalgExt::ScatterOp)link

Scatters an input in slices based on a tensor of indices.

Syntax:

operation ::= `iree_linalg_ext.scatter` attr-dict `dimension_map` `=` $dimension_map
              `unique_indices` `(` $unique_indices `)`
              (`ins` `(` $inputs^ `:` type($inputs) `)`)?
              `outs` `(` $outputs `:` type($outputs) `)`
              $region (`->` type($results)^)?

Takes two inputs (update and indices) and outputs value (original). The operation updates the value at the slices specified by indices by combining the current value with the value in updates using the computation specified in region. The region specifies a binary operation of signature (T, T) -> T, where T is the element-type of updates (and original). The first argument is from updates, and the second is from original.

The size of the dimension_map attribute is used to determine how many indices are used to index into original, i.e. index_depth. The dimension_map attribute describes which index value maps to which dimension in the destination.

The operand indices is a N-D tensor/memref type that is composed of two logical parts. The first N-1 dimensions represent the batch of updates. The last dim (at index N-1) is the index_depth, which can be omitted if index_depth is 1.

The operand update is a M-D tensor/memref type and similarly consists of two parts. The first N-1 dimensions represent the batch of updates. This must exactly match to the first N-1 dimensions in indices. Dimensions N..M-1 represent the slice scattered into original, update_slice, and must match the last dimensions in original. This represents a contiguous slice to be inserted into original.

The operand original is a DPS init representing the destination that update gets scattered to. Where rank(original) = rank(update_slice) + index_depth

The unique_indices attribute carries the information whether all the indices are unique. If unique_indices is true and two or more updates scatter to the same location in original the final value in original is not guaranteed. If unique_indices is set to false, the first batch_rank iteration loops will be marked as reduction.

The shapes definition follows tensorflow operations. See more information in https://www.tensorflow.org/api_docs/python/tf/tensor_scatter_nd_update

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, LinalgFusionOpInterface, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`dimension_map`	::mlir::DenseI64ArrayAttr	i64 dense array attribute
`unique_indices`	::mlir::BoolAttr	bool attribute

Operands:link

Operand	Description
`inputs`	variadic of ranked tensor or memref of any type values
`outputs`	variadic of ranked tensor or memref of any type values

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

`iree_linalg_ext.sort` (LinalgExt::SortOp)link

Sorts a tensor a specified dimension.

Syntax:

operation ::= `iree_linalg_ext.sort` attr-dict
              `dimension` `(` $dimension `)`
              (`ins` `(` $inputs^ `:` type($inputs) `)`)?
              `outs` `(` $outputs `:` type($outputs) `)`
              $region (`->` type($results)^)?

Based on XLA operation semantics, sorts the given operands at the given dimension with the given comparator.

See https://www.tensorflow.org/xla/operation_semantics#sort.

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`dimension`	::mlir::IntegerAttr	64-bit signless integer attribute

Operands:link

Operand	Description
`inputs`	variadic of any type
`outputs`	variadic of shaped of any type values

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

`iree_linalg_ext.topk` (LinalgExt::TopkOp)link

Top-K operator.

Syntax:

operation ::= `iree_linalg_ext.topk` attr-dict
              `dimension` `(` $dimension `)`
              `ins` `(` $inputs `:` type($inputs) `)`
              `outs` `(` $outputs `:` type($outputs) `)`
              $region (`->` type($results)^)?

A Top-K operation for N-D tensors. Reduces the target dimension from the input size N down to K elements based on the supplied binary region.

Accepts an N-D tensor input consisting of values and an optioanl N-D tensor for indices of those values (i32 type). If input indices aren't provided, the index mapping is inferred based on the k dim. Both input values/indices tensors and output values/indicies tensors must have the same shape. Top-K is computed along the target dimension (from dimension()). Returns two output tensors of values and the indicies of Top-K results. The output dimensions must match the input save for the dimension that is reduced to K results.

Region accepts lhs=[next N input] and rhs=[exiting K output] and yeilds an i1. If true, the two values are swapped: - For Top-K compoarision: > - For Min-K comparision: < Note: when the two values are equal, the first occurence is always selected.

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, LinalgExtOp, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`dimension`	::mlir::IntegerAttr	64-bit signless integer attribute

Operands:link

Operand	Description
`inputs`	variadic of shaped of any type values
`outputs`	variadic of shaped of any type values

Results:link

Result	Description
`results`	variadic of ranked tensor of any type values

Utility opslink

`iree_linalg_ext.index` (LinalgExt::IndexOp)link

LinalgExt index operation.

Syntax:

operation ::= `iree_linalg_ext.index` $dim attr-dict `:` type($result)

This operation is a mirror of linalg.index operation and has the same semantics, except that linalg.index enforces that the parent op is a LinalgOp, and the iree_linalg_ext.index operation enforces that the parent op is a IREE::LinalgExt::CustomOp.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes:link

Attribute	MLIR Type	Description
`dim`	::mlir::IntegerAttr	64-bit signless integer attribute whose minimum value is 0

Results:link

Result	Description
`result`	index

`iree_linalg_ext.yield` (LinalgExt::YieldOp)link

LinalgExt yield op.

Syntax:

operation ::= `iree_linalg_ext.yield` attr-dict ($operands^ `:` type($operands))?

iree_linalg_ext.yield is a special terminator operation for blocks inside regions in iree_linalg_ext ops.

Traits: AlwaysSpeculatableImplTrait, ReturnLike, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface), RegionBranchTerminatorOpInterface

Effects: MemoryEffects::Effect{}

Operands:link

Operand	Description
`operands`	variadic of any type

Winograd opslink

`iree_linalg_ext.winograd.filter_transform` (LinalgExt::WinogradFilterTransformOp)link

Winograd Filter Transform operator.

Syntax:

operation ::= `iree_linalg_ext.winograd.filter_transform` attr-dict
              `output_tile_size` `(` $output_tile_size `)`
              `kernel_size` `(` $kernel_size `)`
              `kernel_dimensions` `(` $kernel_dimensions `)`
              `ins` `(` $inputs `:` type($inputs) `)`
              `outs` `(` $outputs `:` type($outputs) `)`
              (`->` type($result)^)?

This operator is part of the first step in converting a convolution to its Winograd equivalent. Given a tile of a convolution filter (F), this operator computes matmul(G, matmul(F, transpose(B))). The filter tile is assumed to be the full m x m convolutional kernel, and the result of the transformation on this tile is a square with each side of size m + r - 1, where the output tile size is r x r. G is a constant 2-d matrix of shape (m + r - 1) x m. The input to the operator is a filter of shape (H, W, C, F) or (F, C, H, W) and the output is an operator of shape (m + r - 1, m + r - 1, C, F). The result of this operator is first collapsed and then fed to a batch matmul op.

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`output_tile_size`	::mlir::IntegerAttr	64-bit signless integer attribute
`kernel_size`	::mlir::IntegerAttr	64-bit signless integer attribute
`kernel_dimensions`	::mlir::DenseI64ArrayAttr	i64 dense array attribute

Operands:link

Operand	Description
`inputs`	variadic of shaped of any type values
`outputs`	variadic of shaped of any type values

Results:link

Result	Description
`result`	variadic of ranked tensor of any type values

`iree_linalg_ext.winograd.input_transform` (LinalgExt::WinogradInputTransformOp)link

Winograd Input Transform operator.

Syntax:

operation ::= `iree_linalg_ext.winograd.input_transform` attr-dict
              `output_tile_size` `(` $output_tile_size `)`
              `kernel_size` `(` $kernel_size `)`
              `image_dimensions` `(` $image_dimensions `)`
              `ins` `(` $inputs `:` type($inputs) `)`
              `outs` `(` $outputs `:` type($outputs) `)`
              (`->` type($result)^)?

This operator is part of the first step in converting a convolution to its Winograd equivalent. Given a tile of an input image (I), this operator computes matmul(transpose(B), matmul(I, B)). The input tile is assumed to be square with each side of size m + r - 1, where the convolutional kernel is m x m and the output tile size is r x r. B is a constant 2-d square matrix of the same shape as the input tile I. The input to the operator is an image of shape (N, H, W, C) or (N, C, H, W) and the output is an operator of shape (m + r - 1, m + r - 1, N, H', W', C) where H' = ceil((H - m + 1)/r) and W' = ceil((W - m + 1)/r). The result of this operator is first collapsed and then fed to a batch matmul op.

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`output_tile_size`	::mlir::IntegerAttr	64-bit signless integer attribute
`kernel_size`	::mlir::IntegerAttr	64-bit signless integer attribute
`image_dimensions`	::mlir::DenseI64ArrayAttr	i64 dense array attribute

Operands:link

Operand	Description
`inputs`	variadic of shaped of any type values
`outputs`	variadic of shaped of any type values

Results:link

Result	Description
`result`	variadic of ranked tensor of any type values

`iree_linalg_ext.winograd.output_transform` (LinalgExt::WinogradOutputTransformOp)link

Winograd Output Transform operator.

Syntax:

operation ::= `iree_linalg_ext.winograd.output_transform` attr-dict
              `output_tile_size` `(` $output_tile_size `)`
              `kernel_size` `(` $kernel_size `)`
              `image_dimensions` `(` $image_dimensions `)`
              `ins` `(` $inputs `:` type($inputs) `)`
              `outs` `(` $outputs `:` type($outputs) `)`
              (`->` type($result)^)?

This operator is the last transform in converting a convolution to its Winograd equivalent. After convolution in the Winograd domain (which turns into an elementwise product for a single channel and batch matrix multiplication for many channels), this operator converts the output back into the original domain. Given a tile of the output (O) in the Winograd domain, this operator computes matmul(transpose(A), matmul(O, A)). The output tile is square with each side of size m + r - 1, where the convolutional kernel is m x m and the output tile size is r x r. A is a constant 2-d matrix of shape (m + r - 1) x r. The input to the operator is a tensor of shape (m + r - 1, m + r - 1, N, H', W', C) and the output is a tensor of shape (N, H, W, C) or (N, C, H, W) where H = r H' and W = r W'. This operator is followed by a tensor.extract_slice which extracts only the non-padded part of the output.

Traits: AttrSizedOperandSegments, SingleBlockImplicitTerminator<::mlir::iree_compiler::IREE::LinalgExt::YieldOp>, SingleBlock

Interfaces: DestinationStyleOpInterface, LinalgExtInterface, MemoryEffectOpInterface, ReifyRankedShapedTypeOpInterface, TilingInterface

Attributes:link

Attribute	MLIR Type	Description
`output_tile_size`	::mlir::IntegerAttr	64-bit signless integer attribute
`kernel_size`	::mlir::IntegerAttr	64-bit signless integer attribute
`image_dimensions`	::mlir::DenseI64ArrayAttr	i64 dense array attribute

Operands:link

Operand	Description
`inputs`	variadic of shaped of any type values
`outputs`	variadic of shaped of any type values

Results:link

Result	Description
`result`	variadic of ranked tensor of any type values

Attributeslink

IteratorTypeAttrlink

Iterator type

Syntax:

#iree_linalg_ext.iterator_type<
  ::mlir::utils::IteratorType   # value
>

Parameters:link

Parameter	C++ type	Description
value	`::mlir::utils::IteratorType`	an enum of type IteratorType

SplitReductionMappingAttrlink

Syntax:

#iree_linalg_ext.split_reduction_mapping<
  int64_t   # dimension
>

Mapping attribute to indicate distribution for split-reduction.

This attribute indicates that the scf.forall represent parallel partial reductions generated through the use of ReductionTilingStrategy::PartialReductionOuterParallel.

Parameters:link

Parameter	C++ type	Description
dimension	`int64_t`

Enumslink

IteratorTypelink

Iterator type

Cases:link

Symbol	Value	String
parallel	`0`	parallel
reduction	`1`	reduction

'iree_linalg_ext' Dialectlink

Operationslink

iree_linalg_ext.custom_op (LinalgExt::CustomOp)link

Attributes:link

Operands:link

Results:link

Data tiling opslink

iree_linalg_ext.pack (LinalgExt::PackOp)link

Attributes:link

Operands:link

Results:link

iree_linalg_ext.unpack (LinalgExt::UnPackOp)link

Attributes:link

Operands:link

Results:link

Non-structured opslink

iree_linalg_ext.arg_compare (LinalgExt::ArgCompareOp)link

Attributes:link

Operands:link

Results:link

iree_linalg_ext.attention (LinalgExt::AttentionOp)link

Attributes:link

Operands:link

Results:link

iree_linalg_ext.fft (LinalgExt::FftOp)link

Operands:link

Results:link

iree_linalg_ext.gather (LinalgExt::GatherOp)link

Attributes:link

Operands:link

Results:link

iree_linalg_ext.im2col (LinalgExt::Im2colOp)link

Attributes:link

Operands:link

Results:link

iree_linalg_ext.map_scatter (LinalgExt::MapScatterOp)link

Operands:link

Results:link

iree_linalg_ext.online_attention (LinalgExt::OnlineAttentionOp)link

Attributes:link

Operands:link

Results:link

iree_linalg_ext.scan (LinalgExt::ScanOp)link

Attributes:link

Operands:link

Results:link

iree_linalg_ext.scatter (LinalgExt::ScatterOp)link

Attributes:link

Operands:link

Results:link

iree_linalg_ext.sort (LinalgExt::SortOp)link

Attributes:link

Operands:link

Results:link

iree_linalg_ext.topk (LinalgExt::TopkOp)link

Attributes:link

Operands:link

Results:link

Utility opslink

iree_linalg_ext.index (LinalgExt::IndexOp)link

Attributes:link

Results:link

iree_linalg_ext.yield (LinalgExt::YieldOp)link

Operands:link

Winograd opslink

iree_linalg_ext.winograd.filter_transform (LinalgExt::WinogradFilterTransformOp)link

Attributes:link

Operands:link

Results:link

iree_linalg_ext.winograd.input_transform (LinalgExt::WinogradInputTransformOp)link

Attributes:link

Operands:link

Results:link

iree_linalg_ext.winograd.output_transform (LinalgExt::WinogradOutputTransformOp)link

Attributes:link

Operands:link

Results:link

Attributeslink

IteratorTypeAttrlink

Parameters:link

`iree_linalg_ext.custom_op` (LinalgExt::CustomOp)link

`iree_linalg_ext.pack` (LinalgExt::PackOp)link

`iree_linalg_ext.unpack` (LinalgExt::UnPackOp)link

`iree_linalg_ext.arg_compare` (LinalgExt::ArgCompareOp)link

`iree_linalg_ext.attention` (LinalgExt::AttentionOp)link

`iree_linalg_ext.fft` (LinalgExt::FftOp)link

`iree_linalg_ext.gather` (LinalgExt::GatherOp)link

`iree_linalg_ext.im2col` (LinalgExt::Im2colOp)link

`iree_linalg_ext.map_scatter` (LinalgExt::MapScatterOp)link

`iree_linalg_ext.online_attention` (LinalgExt::OnlineAttentionOp)link

`iree_linalg_ext.scan` (LinalgExt::ScanOp)link

`iree_linalg_ext.scatter` (LinalgExt::ScatterOp)link

`iree_linalg_ext.sort` (LinalgExt::SortOp)link

`iree_linalg_ext.topk` (LinalgExt::TopkOp)link

`iree_linalg_ext.index` (LinalgExt::IndexOp)link

`iree_linalg_ext.yield` (LinalgExt::YieldOp)link

`iree_linalg_ext.winograd.filter_transform` (LinalgExt::WinogradFilterTransformOp)link

`iree_linalg_ext.winograd.input_transform` (LinalgExt::WinogradInputTransformOp)link

`iree_linalg_ext.winograd.output_transform` (LinalgExt::WinogradOutputTransformOp)link