Common
-iree-codegen-add-fast-math-flags
link
Add fast math flags to all the operations supporting them, given a floating-point mode.
-iree-codegen-block-dynamic-dimensions
link
Expand dynamic dimensions that are known to be multiples of statically known values.
-iree-codegen-bubble-up-ordinal-ops
link
Bubbles op ordinal ops to allow for workgroup count computation
Pass to bubble up ordinal operations to allow workgroup count computation based on slices to correlate back to workload computation.
-iree-codegen-bufferize-copy-only-dispatches
link
Bufferize dispatches that copy to/from interfaces to convert to a linalg.copy op
Pass to bufferize dispatches that are copying from one interface to
another. This will create a linalg.generic
op which is a copy that can
then be used by backends to handle appropriately.
-iree-codegen-bufferize-dispatch-tensor-load-store
link
Bufferize the iree_tensor_ext.dispatch.tensor.load/store ops at dispatch boundaries
Pass to bufferize the edges of dispatch regions, converting iree_tensor_ext.dispatch.tensor.load ops to iree_codegen.load_from_memref, and iree_tensor_ext.dispatch.tensor.store ops to iree_codegen.store_to_memref.
-iree-codegen-canonicalize-scf-for
link
Adhoc canonicalization of selected loop-carried values/dependencies for scf.for ops
-iree-codegen-cleanup-buffer-alloc-view
link
Performs cleanups over HAL interface/buffer allocation/view operations
-iree-codegen-concretize-pad-result-shape
link
Concretizes tensor.pad op's result shape if its source opimplements OffsetSizeAndStrideOpInterface.
-iree-codegen-config-tracking-canonicalize
link
Codegen specific canonicalization pass that tracks lowering configs
Optionslink
-test-convergence : Fails if the patterns fail to converge
-iree-codegen-convert-bf16-to-uint16-buffers
link
Convert BF16 buffer ops and conversions to simulated behavior with uint16.
-iree-codegen-convert-hal-descriptor-type-to-gpu-address-space
link
Convert #hal.descriptor_type to #gpu.address_space
-iree-codegen-convert-to-destination-passing-style
link
Transforms the code to make the dispatch use destination-passing style
Converts entry point function within dispatch regions to use destination-passing style, which is better suited for the upstream comprehensive bufferization pass.
Optionslink
-convert-inputs-to-destinations : Controls whether to adjust consumers to convert one of its inputs to a destination
-use-war-for-cooperative-matrix-codegen : WAR for failure in Cooperative matrix codegen pipelines. See #10648.
-iree-codegen-convolution-to-igemm
link
Transforms convolution operations into an implicit GEMM format.
-iree-codegen-decompose-affine-ops
link
Decompose affine.apply
operations into sub affine.apply
Decompose affine.apply
operations into sub affine.apply
where each
sub expression references values that are defined in the same loop scope.
The sub expression are then stitched back together following the loop
nest order.
The goal of this pass is to break down affine.apply
expressions such
that the resulting sub expressions can be hoisted out in their respective
loop.
E.g., Let's say we have
%res = affine.apply
affine_map<()[s0, s1, s2] -> (s0 * 1024 + s1 * 32 + s2)>()
[%loopVariant, %inv1, %inv2]
%inv1
and %inv2
are loop invariant and %loopVariant
is not.
This will produce the following subexpressions:
// Loop invariant computations first.
%inv1x32 =
affine.apply affine_map<()[s0] -> (s0 * 32)>()[%inv1]
%inv1x32_plus_inv2 =
affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()[%inv1x32, %inv2]
// Loop variant computation next.
%loopVariantx1024 =
affine.apply affine_map<()[s0] -> (s0 * 1024)>()[%loopVariant]
// Compose things back together.
%res =
affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()
[%loopVariant, %inv1x32_plus_inv2]
%inv1x32_plus_inv2
can be hoisted out of the loop.
This pass requires scf.for
structures to still be around otherwise
the break down will be meaningless.
Note: The decomposition performed by this pass will be undone by
canonicalization. Make sure to lower the resulting ops before that.
-iree-codegen-decompose-boundary-pack-unpack-ops
link
Wrapper for DecomposePackUnPackOpsPass to decompose ops at function boundaries
Optionslink
-tile-outer-to-one : Always apply tiling to make outer dimension be ones
-iree-codegen-decompose-convolution-to-lower-dim-ops
link
Decomposes linalg convolution ops to lower dim ops
-iree-codegen-decompose-linalg-generic
link
Decomposes linalg generic ops into individual ops
It is sometimes advantageous to operate on generic ops which contain at most one non-yield body operation. This is most often the case when needing to materialize individual ops (which some backends require). Note that this is often an extreme pessimization unless if part of a lowering flow which was designed for it.
Operates on tensor based linalg ops.
-iree-codegen-decompose-memrefs
link
Decomposes memrefs
-iree-codegen-decompose-pack-unpack-ops
link
Decompose pack/unpack ops into vectorizable ops
Optionslink
-tile-outer-to-one : Always apply tiling to make outer dimension be ones
-use-only-reshapes : Use decomposition into reshape ops, even when packing unit dimensions.
-iree-codegen-decompose-softmax
link
Decomposes softmax op into a sequence of linalg ops
Optionslink
-use-fusion : Whether to use the internal pass fusion logic for the exp function. See #15862.
-iree-codegen-drop-vector-unit-dims
link
Pass to drop vector unit dims.
-iree-codegen-emulate-narrow-type
link
Emulate narrow integer operations using wide integer operations
A pass to emulate memref load operations that use narrow integer types with equivalent operations on supported wide integer types.
-iree-codegen-erase-dead-alloc-and-stores
link
Erase alloc ops if all the uses are just stores
-iree-codegen-erase-hal-descriptor-type-from-memref
link
Erase #hal.descriptor_type from MemRef memory space
-iree-codegen-expand-strided-metadata
link
Resolve memref.extract_strided_metadata operations
Optionslink
-allow-subview-expansion : Enables expansion of memref.subview ops
-allow-unresolved : Allow unresolved strided metadata op (for testing)
-iree-codegen-extract-address-computation
link
Extract address computations from memory accesses
Extract the address computation from the instructions with memory accesses such that these memory accesses use only a base pointer.
For instance,
memref.load %base[%off0, ...]
Will be rewritten in:
%new_base = memref.subview %base[%off0,...][1,...][1,...]
memref.load %new_base[%c0,...]
-iree-codegen-flatten-memref-subspan
link
Flatten n-D MemRef subspan ops to 1-D ones and fold byte offsets
Flattens n-D MemRef subspan ops to 1-D MemRef and folds the byte offsets on subspan ops to the consumer load/store ops, in preparation for lowering to backends that require linearized access.
-iree-codegen-fold-affinemin-in-distributed-loops
link
Fold affine.min
ops in distributed loops
-iree-codegen-fold-tensor-extract-op
link
Fold tensor.extract
operations prior to lowering to LLVM
After running the upstream TensorConstantBufferize pass, remove tensor_loads introduced for use only in tensor_extract. These can be folded to use a load of the created memref object that holds the constant values.
-iree-codegen-fuse-tensor-pad-with-consumer
link
Fuse tensor.pad op into its consumer op's tiled loop nest
-iree-codegen-generic-vectorization
link
Pass to perform vectorization on tensor/linalg ops.
Optionslink
-enable-vector-masking : Enable vector masking during vectorization.
-use-configured-vector-sizes : Control whether the op lowering config represents a set of masked vector sizes
-vectorize-copies : Enable vectorization of linalg.copy operations.
-vectorize-padding : Rewrite all tensor.pad ops in the function to vector form.
-vectorize-gather-accesses : Enable vectorizaiton of operations that may generate vector.gather operations.
-vectorize-to-transfer-gather : Enables vectorization of gather-like operations that may generate iree_vector_ext.transfer_gather
-enable-cleanup : Enable cleanups after vectorization. The patterns touch the structuregenerated from tiling so it affects later steps like bufferization and vector hoisting.
-generate-contract : Enable conversion for reduction ops to contraction ops.
-fold-cast-into-contract : Enable folding casting ops into vector.contract.
-max-vector-size : Max vector size allowed to avoid creating large vectors.
-iree-codegen-hoist-statically-bound-allocations
link
Hoist statically bound alloca ops to the entry block of functions
Optionslink
-vscale-min : Minimum possible value of vscale.
-vscale-max : Maximum possible value of vscale (a value of zero means unbounded).
-iree-codegen-hoist-vector-extract-insert-slice
link
Hoist unrolled vector (extract, insert) pairs out of scf.for op
-iree-codegen-instrument-memory-accesses
link
Instruments memory reads and writes for address tracking when dispatch instrumentation is enabled.
-iree-codegen-iree-bufferize-constants
link
Convert from arith.constant on tensors to buffers
-iree-codegen-iree-comprehensive-bufferize
link
Convert from to Linalg ops on tensors to buffers
Optionslink
-test-analysis-only : Only runs inplaceability analysis (for testing purposes only)
-print-conflicts : Annotates IR with RaW conflicts. Requires test-analysis-only.
-iree-codegen-link-tuning-specs
link
Link nested transform dialect tuning specs named sequences into a single entry point
Given a module with multiple nested tuning specs, introduce a new named sequence that includes all the other tuning spec entry points. The order of inclusion is the same as the order in which these nested tuning specs appear in the IR.
A tuning spec entry point is a transform.named_sequence
op annotated with the
iree_codegen.tuning_spec
unit attribute. We require it to perform in-place op
modification and not consume the handle.
-iree-codegen-lower-executable-using-transform-dialect
link
Lower executables using the transform dialect recipe provided in the module.
-iree-codegen-lower-ukernel-ops-to-calls
link
Lower micro-kernel wrapper ops into function calls
-iree-codegen-lowering-config-interpreter
link
Pass to apply lowering config annotated strategies.
This pass runs the transform dialect interpreter and applies the named sequence transformation specified by lowering configs annotated on operations.
-iree-codegen-materialize-device-encoding
link
Materialize the encoding for tensor as specified by the backend.
Optionslink
-test-cl-gpu-target : Flag used for lit-testing GPU target only. Not for general usage
-iree-codegen-materialize-encoding-into-nop
link
Drop the encodings from tensor types with encodings.
-iree-codegen-materialize-encoding-into-padding
link
Materialize #iree_encoding.pad_encoding_layout
attributes.
Handles padding introduced by pad_encoding_layout
encoding layouts, which
requires iree_tensor_ext.dispatch.tensor.load
/.store
to be adjusted to account for
padding regions.
Materializes any other encoding layouts into nop.
-iree-codegen-materialize-host-encoding
link
Materialize the encoding for tensor as specified by the backend.
-iree-codegen-materialize-tuning-specs
link
Load tuning spec transform dialect libraries and encode them in the module
Links all available tuning spec transform dialect modules into a single
tuning spec. Next, serializes this tuning spec to bytecode and attaches it
as a module attribute. We do this so that the full tuning spec is always
encoded in the program IR and can be checked with --mlir-print-ir-after-all
(or equivalent). The alternative would be to add the tuning spec as a
submodule in the compiled program, but this may result in the tuning spec
being inadvertently visited by other passes that attempt to walk
the outer
module. Serialization makes the tuning specs opaque and prevents it from
happening.
This attribute is expected to be short-lived and removed by
iree-codegen-materialize-user-configs
.
-iree-codegen-materialize-user-configs
link
Sets the lowering configs and translation info from user configs
-iree-codegen-math-transform
link
Apply math ops transformations: approximations, rewrites to other math ops, operand casts.
-iree-codegen-memrefcopy-to-linalg
link
Convert memref.copy to linalg op
-iree-codegen-normalize-loop-bounds
link
Normalize the loop bounds of scf.for
and scf.forall
Normalizes the iteration range of scf.for
and scf.forall
loops to
[0, ub) += 1.
Optionslink
-normalize-for : Enable normalization for `scf.for` loops
-normalize-forall : Enable normalization for `scf.forall` loops
-iree-codegen-optimize-tensor-insert-extract-slices
link
Optimize tensor.insert_slice/tensor.extract_slice operations (e.g. hoist and fold)
Optionslink
-fold-identity-slices : Enable folding of identity tensor.*_slice ops.
-iree-codegen-optimize-vector-transfer
link
Run optimization transformations on vector transfer operations
Optionslink
-flatten : Flatten the vector type of vector transfers where possible (contiguous row-major data).
-redundant-hoisting : Enables use of redundant vector transfer hoisting.
-iree-codegen-pad-dynamic-alloc
link
Pass to pad dynamic alloc into static one.
-iree-codegen-propagate-dispatch-size-bounds
link
Pass to annotate workitem and workgroup IDs with known bounds
-iree-codegen-propagate-reshapes-by-expansion
link
Propagates reshaping operations by expansion.
Pass to propagate reshapes by expansion through all ops without explicit lowering configurations.
-iree-codegen-reconcile-translation-info
link
Reconcile information (like workgroup_size, subgroup_size) across TranslationInfo
set on each function in the dispatch and merge themand set them at the appropriate places in the surrounding HAL ops
-iree-codegen-rematerialize-parallel-ops
link
Pass to rematerialize and merge parallel ops into consumers.
-iree-codegen-remove-single-iteration-loop
link
Remove distributed loop with single iteration.
-iree-codegen-replace-slow-min-max-ops
link
Replace slow min/max operations that propagate NaNs and distinguish between +/-0.0 with faster min/max operations that ignore them.
-iree-codegen-resolve-swizzle-hints
link
Resolves iree_codegen.swizzle_hint ops
-iree-codegen-split-full-partial-transfer
link
Split a vector.transfer operation into an in-bounds (i.e., no out-of-bounds masking) fastpath and a slowpath.
Optionslink
-split-transfers : Split vector transfers between slow (masked) and fast "
"(unmasked) variants. Possible options are:\n"
"\tnone [default]: keep unsplit vector.transfer and pay the price\n"
"\tlinalg-copy: use linalg.fill + linalg.generic for the slow path\n"
"\tvector-transfers: use extra small unmasked vector.transfers for"
" the slow path\n
-iree-codegen-strip-compilation-info
link
Remove all the the lowering configuration and translation info attributes.
-iree-codegen-test-executable-preprocessing
link
Tests iree-hal-preprocess-executables-with behavior.
-iree-codegen-test-partitionable-loops-interface
link
Test the PartitionableLoopsInterface
-iree-codegen-tile-and-distribute-to-workgroups
link
Tile and distribute operations to workgroups
Optionslink
-max-workgroup-parallel-dims : Maximum number of dims to distribute workgroups across.
-distribution-method : Pick the distribution method. See linalg::DistributionMethod for details
-iree-codegen-tile-and-distribute-to-workgroups-using-forall-op
link
Tile and distribute operation to workgroups (using scf.forall op)
Optionslink
-transpose-workgroup : Swaps the workgroup mapping attribute x and y.Only swaps when the loop bounds are static.
-iree-codegen-tile-large-tensors
link
Greedily tiles all linalg ops that are beyond a certain size
Optionslink
-max-vector-size : Maximum static size to tile to (i.e. all remaining ops will be smaller)
-iree-codegen-type-propagation
link
Propogate the type of tensor to avoid load/stores of illegal bit widths
-iree-codegen-unroll-annotated-loops
link
Unrolls all scf.for loops marked with unroll_loop
-iree-codegen-vector-transfer-lowering
link
Pass to lower transfer ops to simpler ops like vector.load
, vector.store
, vector.broadcast
, and a set of scf ops.
Optionslink
-enable-scalable-lowerings : Enables scalable vector specific transfer lowerings
-iree-codegen-vectorize-memref-copy
link
Vectorizes memref copy operations.
-iree-codegen-vectorize-tensor-pad
link
Vectorize a very specific form of tensor.pad with control flows
-iree-codegen-verify-workgroup-distribution
link
Pass to verify proper distribution to workgroups.
Pass to verify that all writes to global memory are explicitly mapped to workgroups. This means that in cases where we use loops (scf.forall) to manage distribution to workgroups, we require that all ops with write side effects are contained within a workgroup distributed loop.
-iree-convert-accgemm-to-gemm
link
Convert accumulating GEMMs to GEMMs post dispatch creation.
-iree-convert-bf16-arith-to-f32
link
Convert bf16 arithmetic operations to f32
-iree-convert-unsupported-float-arith
link
Convert arith operations on unsupported(source types) float types to the target type. Populates the source and target based on the target architecture.
-iree-eliminate-empty-tensors
link
Eliminate tensor.empty ops to avoid buffer allocations
-iree-loop-invariant-code-motion
link
Performs LICM on loops guaranteed to have >= 1 trip
This is a mirror of the upstream LICM pass that restricts to loops that are guaranteed to have at least one trip. This currently only supports loops that expose a lower and upper bound as the generic loop-like interface does not expose a way to query for trip count.
Additionally code motion of scf.forall
ops with mappings is always unsafe
and is explicitly disabled.
-iree-transform-dialect-interpreter
link
Pass to apply transform dialect operations.
This pass runs the transform dialect interpreter and applies the named
sequence transformation specified by the provided name (defaults to
TransformDialect::kTransformEntryPointSymbolName
(i.e. __transform_main
)).
Optionslink
-entry-point : Entry point of the pass pipeline.
-library-file-name : File path to load a library of transform dialect strategies from.