Skip to content

Common

-iree-codegen-add-fast-math-flagslink

Add fast math flags to all the operations supporting them, given a floating-point mode.

-iree-codegen-block-dynamic-dimensionslink

Expand dynamic dimensions that are known to be multiples of statically known values.

-iree-codegen-bubble-up-ordinal-opslink

Bubbles op ordinal ops to allow for workgroup count computation

Pass to bubble up ordinal operations to allow workgroup count computation based on slices to correlate back to workload computation.

-iree-codegen-bufferize-copy-only-dispatcheslink

Bufferize dispatches that copy to/from interfaces to convert to a linalg.copy op

Pass to bufferize dispatches that are copying from one interface to another. This will create a linalg.generic op which is a copy that can then be used by backends to handle appropriately.

-iree-codegen-bufferize-dispatch-tensor-load-storelink

Bufferize the iree_tensor_ext.dispatch.tensor.load/store ops at dispatch boundaries

Pass to bufferize the edges of dispatch regions, converting iree_tensor_ext.dispatch.tensor.load ops to iree_codegen.load_from_memref, and iree_tensor_ext.dispatch.tensor.store ops to iree_codegen.store_to_memref.

-iree-codegen-canonicalize-scf-forlink

Adhoc canonicalization of selected loop-carried values/dependencies for scf.for ops

-iree-codegen-cleanup-buffer-alloc-viewlink

Performs cleanups over HAL interface/buffer allocation/view operations

-iree-codegen-concretize-pad-result-shapelink

Concretizes tensor.pad op's result shape if its source opimplements OffsetSizeAndStrideOpInterface.

-iree-codegen-config-tracking-canonicalizelink

Codegen specific canonicalization pass that tracks lowering configs

Optionslink

-test-convergence : Fails if the patterns fail to converge

-iree-codegen-convert-bf16-to-uint16-bufferslink

Convert BF16 buffer ops and conversions to simulated behavior with uint16.

-iree-codegen-convert-hal-descriptor-type-to-gpu-address-spacelink

Convert #hal.descriptor_type to #gpu.address_space

-iree-codegen-convert-to-destination-passing-stylelink

Transforms the code to make the dispatch use destination-passing style

Converts entry point function within dispatch regions to use destination-passing style, which is better suited for the upstream comprehensive bufferization pass.

Optionslink

-convert-inputs-to-destinations         : Controls whether to adjust consumers to convert one of its inputs to a destination
-use-war-for-cooperative-matrix-codegen : WAR for failure in Cooperative matrix codegen pipelines. See #10648.

-iree-codegen-convolution-to-igemmlink

Transforms convolution operations into an implicit GEMM format.

-iree-codegen-decompose-affine-opslink

Decompose affine.apply operations into sub affine.apply

Decompose affine.apply operations into sub affine.apply where each sub expression references values that are defined in the same loop scope. The sub expression are then stitched back together following the loop nest order. The goal of this pass is to break down affine.apply expressions such that the resulting sub expressions can be hoisted out in their respective loop. E.g., Let's say we have

%res = affine.apply
         affine_map<()[s0, s1, s2] -> (s0 * 1024 + s1 * 32 + s2)>()
           [%loopVariant, %inv1, %inv2]
Where %inv1 and %inv2 are loop invariant and %loopVariant is not. This will produce the following subexpressions:
// Loop invariant computations first.
%inv1x32 =
  affine.apply affine_map<()[s0] -> (s0 * 32)>()[%inv1]
%inv1x32_plus_inv2 =
  affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()[%inv1x32, %inv2]
// Loop variant computation next.
%loopVariantx1024 =
  affine.apply affine_map<()[s0] -> (s0 * 1024)>()[%loopVariant]
// Compose things back together.
%res =
  affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()
    [%loopVariant, %inv1x32_plus_inv2]
Now the sequence of instructions leading to and including %inv1x32_plus_inv2 can be hoisted out of the loop. This pass requires scf.for structures to still be around otherwise the break down will be meaningless. Note: The decomposition performed by this pass will be undone by canonicalization. Make sure to lower the resulting ops before that.

-iree-codegen-decompose-boundary-pack-unpack-opslink

Wrapper for DecomposePackUnPackOpsPass to decompose ops at function boundaries

Optionslink

-tile-outer-to-one : Always apply tiling to make outer dimension be ones

-iree-codegen-decompose-convolution-to-lower-dim-opslink

Decomposes linalg convolution ops to lower dim ops

-iree-codegen-decompose-linalg-genericlink

Decomposes linalg generic ops into individual ops

It is sometimes advantageous to operate on generic ops which contain at most one non-yield body operation. This is most often the case when needing to materialize individual ops (which some backends require). Note that this is often an extreme pessimization unless if part of a lowering flow which was designed for it.

Operates on tensor based linalg ops.

-iree-codegen-decompose-memrefslink

Decomposes memrefs

-iree-codegen-decompose-pack-unpack-opslink

Decompose pack/unpack ops into vectorizable ops

Optionslink

-tile-outer-to-one : Always apply tiling to make outer dimension be ones
-use-only-reshapes : Use decomposition into reshape ops, even when packing unit dimensions.

-iree-codegen-decompose-softmaxlink

Decomposes softmax op into a sequence of linalg ops

Optionslink

-use-fusion : Whether to use the internal pass fusion logic for the exp function. See #15862.

-iree-codegen-drop-vector-unit-dimslink

Pass to drop vector unit dims.

-iree-codegen-emulate-narrow-typelink

Emulate narrow integer operations using wide integer operations

A pass to emulate memref load operations that use narrow integer types with equivalent operations on supported wide integer types.

-iree-codegen-erase-dead-alloc-and-storeslink

Erase alloc ops if all the uses are just stores

-iree-codegen-erase-hal-descriptor-type-from-memreflink

Erase #hal.descriptor_type from MemRef memory space

-iree-codegen-expand-strided-metadatalink

Resolve memref.extract_strided_metadata operations

Optionslink

-allow-subview-expansion : Enables expansion of memref.subview ops
-allow-unresolved        : Allow unresolved strided metadata op (for testing)

-iree-codegen-extract-address-computationlink

Extract address computations from memory accesses

Extract the address computation from the instructions with memory accesses such that these memory accesses use only a base pointer.

For instance,

memref.load %base[%off0, ...]

Will be rewritten in:

%new_base = memref.subview %base[%off0,...][1,...][1,...]
memref.load %new_base[%c0,...]

-iree-codegen-flatten-memref-subspanlink

Flatten n-D MemRef subspan ops to 1-D ones and fold byte offsets

Flattens n-D MemRef subspan ops to 1-D MemRef and folds the byte offsets on subspan ops to the consumer load/store ops, in preparation for lowering to backends that require linearized access.

-iree-codegen-fold-affinemin-in-distributed-loopslink

Fold affine.min ops in distributed loops

-iree-codegen-fold-tensor-extract-oplink

Fold tensor.extract operations prior to lowering to LLVM

After running the upstream TensorConstantBufferize pass, remove tensor_loads introduced for use only in tensor_extract. These can be folded to use a load of the created memref object that holds the constant values.

-iree-codegen-fuse-tensor-pad-with-consumerlink

Fuse tensor.pad op into its consumer op's tiled loop nest

-iree-codegen-generic-vectorizationlink

Pass to perform vectorization on tensor/linalg ops.

Optionslink

-enable-vector-masking        : Enable vector masking during vectorization.
-use-configured-vector-sizes  : Control whether the op lowering config represents a set of masked vector sizes
-vectorize-copies             : Enable vectorization of linalg.copy operations.
-vectorize-padding            : Rewrite all tensor.pad ops in the function to vector form.
-vectorize-gather-accesses    : Enable vectorizaiton of operations that may generate vector.gather operations.
-vectorize-to-transfer-gather : Enables vectorization of gather-like operations that may generate iree_vector_ext.transfer_gather
-enable-cleanup               : Enable cleanups after vectorization. The patterns touch the structuregenerated from tiling so it affects later steps like bufferization and vector hoisting.
-generate-contract            : Enable conversion for reduction ops to contraction ops.
-fold-cast-into-contract      : Enable folding casting ops into vector.contract.
-max-vector-size              : Max vector size allowed to avoid creating large vectors.

-iree-codegen-hoist-statically-bound-allocationslink

Hoist statically bound alloca ops to the entry block of functions

Optionslink

-vscale-min : Minimum possible value of vscale.
-vscale-max : Maximum possible value of vscale (a value of zero means unbounded).

-iree-codegen-hoist-vector-extract-insert-slicelink

Hoist unrolled vector (extract, insert) pairs out of scf.for op

-iree-codegen-instrument-memory-accesseslink

Instruments memory reads and writes for address tracking when dispatch instrumentation is enabled.

-iree-codegen-iree-bufferize-constantslink

Convert from arith.constant on tensors to buffers

-iree-codegen-iree-comprehensive-bufferizelink

Convert from to Linalg ops on tensors to buffers

Optionslink

-test-analysis-only : Only runs inplaceability analysis (for testing purposes only)
-print-conflicts    : Annotates IR with RaW conflicts. Requires test-analysis-only.

Link nested transform dialect tuning specs named sequences into a single entry point

Given a module with multiple nested tuning specs, introduce a new named sequence that includes all the other tuning spec entry points. The order of inclusion is the same as the order in which these nested tuning specs appear in the IR.

A tuning spec entry point is a transform.named_sequence op annotated with the iree_codegen.tuning_spec unit attribute. We require it to perform in-place op modification and not consume the handle.

-iree-codegen-lower-executable-using-transform-dialectlink

Lower executables using the transform dialect recipe provided in the module.

-iree-codegen-lower-ukernel-ops-to-callslink

Lower micro-kernel wrapper ops into function calls

-iree-codegen-lowering-config-interpreterlink

Pass to apply lowering config annotated strategies.

This pass runs the transform dialect interpreter and applies the named sequence transformation specified by lowering configs annotated on operations.

-iree-codegen-materialize-device-encodinglink

Materialize the encoding for tensor as specified by the backend.

Optionslink

-test-cl-gpu-target : Flag used for lit-testing GPU target only. Not for general usage

-iree-codegen-materialize-encoding-into-noplink

Drop the encodings from tensor types with encodings.

-iree-codegen-materialize-encoding-into-paddinglink

Materialize #iree_encoding.pad_encoding_layout attributes.

Handles padding introduced by pad_encoding_layout encoding layouts, which requires iree_tensor_ext.dispatch.tensor.load/.store to be adjusted to account for padding regions. Materializes any other encoding layouts into nop.

-iree-codegen-materialize-host-encodinglink

Materialize the encoding for tensor as specified by the backend.

-iree-codegen-materialize-tuning-specslink

Load tuning spec transform dialect libraries and encode them in the module

Links all available tuning spec transform dialect modules into a single tuning spec. Next, serializes this tuning spec to bytecode and attaches it as a module attribute. We do this so that the full tuning spec is always encoded in the program IR and can be checked with --mlir-print-ir-after-all (or equivalent). The alternative would be to add the tuning spec as a submodule in the compiled program, but this may result in the tuning spec being inadvertently visited by other passes that attempt to walk the outer module. Serialization makes the tuning specs opaque and prevents it from happening.

This attribute is expected to be short-lived and removed by iree-codegen-materialize-user-configs.

-iree-codegen-materialize-user-configslink

Sets the lowering configs and translation info from user configs

-iree-codegen-math-transformlink

Apply math ops transformations: approximations, rewrites to other math ops, operand casts.

-iree-codegen-memrefcopy-to-linalglink

Convert memref.copy to linalg op

-iree-codegen-normalize-loop-boundslink

Normalize the loop bounds of scf.for and scf.forall

Normalizes the iteration range of scf.for and scf.forall loops to [0, ub) += 1.

Optionslink

-normalize-for    : Enable normalization for `scf.for` loops
-normalize-forall : Enable normalization for `scf.forall` loops

-iree-codegen-optimize-tensor-insert-extract-sliceslink

Optimize tensor.insert_slice/tensor.extract_slice operations (e.g. hoist and fold)

Optionslink

-fold-identity-slices : Enable folding of identity tensor.*_slice ops.

-iree-codegen-optimize-vector-transferlink

Run optimization transformations on vector transfer operations

Optionslink

-flatten            : Flatten the vector type of vector transfers where possible (contiguous row-major data).
-redundant-hoisting : Enables use of redundant vector transfer hoisting.

-iree-codegen-pad-dynamic-alloclink

Pass to pad dynamic alloc into static one.

-iree-codegen-propagate-dispatch-size-boundslink

Pass to annotate workitem and workgroup IDs with known bounds

-iree-codegen-propagate-reshapes-by-expansionlink

Propagates reshaping operations by expansion.

Pass to propagate reshapes by expansion through all ops without explicit lowering configurations.

-iree-codegen-reconcile-translation-infolink

Reconcile information (like workgroup_size, subgroup_size) across TranslationInfo set on each function in the dispatch and merge themand set them at the appropriate places in the surrounding HAL ops

-iree-codegen-rematerialize-parallel-opslink

Pass to rematerialize and merge parallel ops into consumers.

-iree-codegen-remove-single-iteration-looplink

Remove distributed loop with single iteration.

-iree-codegen-replace-slow-min-max-opslink

Replace slow min/max operations that propagate NaNs and distinguish between +/-0.0 with faster min/max operations that ignore them.

-iree-codegen-resolve-swizzle-hintslink

Resolves iree_codegen.swizzle_hint ops

-iree-codegen-split-full-partial-transferlink

Split a vector.transfer operation into an in-bounds (i.e., no out-of-bounds masking) fastpath and a slowpath.

Optionslink

-split-transfers : Split vector transfers between slow (masked) and fast "
        "(unmasked) variants. Possible options are:\n"
          "\tnone [default]: keep unsplit vector.transfer and pay the price\n"
          "\tlinalg-copy: use linalg.fill + linalg.generic for the slow path\n"
          "\tvector-transfers: use extra small unmasked vector.transfers for"
          " the slow path\n

-iree-codegen-strip-compilation-infolink

Remove all the the lowering configuration and translation info attributes.

-iree-codegen-test-executable-preprocessinglink

Tests iree-hal-preprocess-executables-with behavior.

-iree-codegen-test-partitionable-loops-interfacelink

Test the PartitionableLoopsInterface

-iree-codegen-tile-and-distribute-to-workgroupslink

Tile and distribute operations to workgroups

Optionslink

-max-workgroup-parallel-dims : Maximum number of dims to distribute workgroups across.
-distribution-method         : Pick the distribution method. See linalg::DistributionMethod for details

-iree-codegen-tile-and-distribute-to-workgroups-using-forall-oplink

Tile and distribute operation to workgroups (using scf.forall op)

Optionslink

-transpose-workgroup : Swaps the workgroup mapping attribute x and y.Only swaps when the loop bounds are static.

-iree-codegen-tile-large-tensorslink

Greedily tiles all linalg ops that are beyond a certain size

Optionslink

-max-vector-size : Maximum static size to tile to (i.e. all remaining ops will be smaller)

-iree-codegen-type-propagationlink

Propogate the type of tensor to avoid load/stores of illegal bit widths

-iree-codegen-unroll-annotated-loopslink

Unrolls all scf.for loops marked with unroll_loop

-iree-codegen-vector-transfer-loweringlink

Pass to lower transfer ops to simpler ops like vector.load, vector.store, vector.broadcast, and a set of scf ops.

Optionslink

-enable-scalable-lowerings : Enables scalable vector specific transfer lowerings

-iree-codegen-vectorize-memref-copylink

Vectorizes memref copy operations.

-iree-codegen-vectorize-tensor-padlink

Vectorize a very specific form of tensor.pad with control flows

-iree-codegen-verify-workgroup-distributionlink

Pass to verify proper distribution to workgroups.

Pass to verify that all writes to global memory are explicitly mapped to workgroups. This means that in cases where we use loops (scf.forall) to manage distribution to workgroups, we require that all ops with write side effects are contained within a workgroup distributed loop.

-iree-convert-accgemm-to-gemmlink

Convert accumulating GEMMs to GEMMs post dispatch creation.

-iree-convert-bf16-arith-to-f32link

Convert bf16 arithmetic operations to f32

-iree-convert-unsupported-float-arithlink

Convert arith operations on unsupported(source types) float types to the target type. Populates the source and target based on the target architecture.

-iree-eliminate-empty-tensorslink

Eliminate tensor.empty ops to avoid buffer allocations

-iree-loop-invariant-code-motionlink

Performs LICM on loops guaranteed to have >= 1 trip

This is a mirror of the upstream LICM pass that restricts to loops that are guaranteed to have at least one trip. This currently only supports loops that expose a lower and upper bound as the generic loop-like interface does not expose a way to query for trip count.

Additionally code motion of scf.forall ops with mappings is always unsafe and is explicitly disabled.

-iree-transform-dialect-interpreterlink

Pass to apply transform dialect operations.

This pass runs the transform dialect interpreter and applies the named sequence transformation specified by the provided name (defaults to TransformDialect::kTransformEntryPointSymbolName (i.e. __transform_main)).

Optionslink

-entry-point       : Entry point of the pass pipeline.
-library-file-name : File path to load a library of transform dialect strategies from.