LLVMGPU

`-iree-amdgpu-emulate-narrow-type`link

Emulate narrow integer operations including amdgpu operations

`-iree-convert-to-nvvm`link

Perform final conversion from builtin/GPU/HAL/standard dialect to LLVM and NVVM dialects

`-iree-convert-to-rocdl`link

Perform final conversion from builtin/GPU/HAL/standard dialect to LLVM and ROCDL dialects

`-iree-llvmgpu-1d-vector-canonicalizations`link

Canonicalization patterns for 1-D vectors after legalization.

`-iree-llvmgpu-assign-constant-ordinals`link

Assigns executable constant ordinals across all LLVMGPU variants.

`-iree-llvmgpu-cast-address-space-function`link

Cast address space to generic in CallOp and FuncOp

`-iree-llvmgpu-configure-tensor-layouts`link

Pass to set layouts on tensors for later vector distribution

`-iree-llvmgpu-group-global-loads`link

Group adjacent global loads to improve GPU instruction scheduling

Moves vector.load and memref.load operations from global-memory memrefs next to each other when they are separated only by operations that do not depend on the preceding load's result. This enables the GPU backend to issue multiple global loads before waiting, instead of serializing each load behind its own waitcount.

`-iree-llvmgpu-legalize-nd-vectors`link

Legalize n-D vectors to 1-D vectors using type conversion.

`-iree-llvmgpu-link-executables`link

Links LLVMGPU HAL executables within the top-level program module.

Optionslink

-target : Target backend name whose executables will be linked by this pass.

`-iree-llvmgpu-lower-executable-target`link

Perform lowering of executable target using one of the IREE::HAL::DispatchLoweringPassPipeline

Optionslink

-for-rocdl : Enable features only supported on ROCDL such as delaying lowering of subgroup reduce.

`-iree-llvmgpu-pack-shared-memory-alloc`link

Pass pack shared memory allocation in order to reduce memory usage.

`-iree-llvmgpu-prefetch-shared-memory`link

Rotate scf.for loops to prefetch shared memory with distance 1. This pass is only applicableto ROCDL targets because its effectiveness on non-AMD GPUs lacks testing and evaluation.

Optionslink

-num-stages : Number of pipeline stages (1, 2, or 3+)

`-iree-llvmgpu-select-lowering-strategy`link

Select a IREE::HAL::DispatchLoweringPassPipeline for lowering the target variant

Optionslink

-gpu-options : GPU codegen options consumed by this pass; see GPUCodegenOptions for the available and default settings.

`-iree-llvmgpu-tensorcore-vectorization`link

Pass to convert linalg into Vector and transform it to a form that can be lowered to GPU MMA ops

`-iree-llvmgpu-tile-and-distribute`link

Pass to tile and distribute linalg ops within a workgroup.

`-iree-llvmgpu-vector-distribute`link

Pass to distribute vectorized functions.

`-iree-llvmgpu-vector-flattening`link

Flatten n-D vectors.

`-iree-llvmgpu-vector-lowering`link

Pass to lower Vector ops before conversion to LLVM.

`-iree-llvmgpu-vector-multi-reduction-lowering`link

Lower vector.multi_reduction ops.

`-iree-llvmgpu-vector-to-gpu`link

Pass to convert vector to gpu.

`-iree-test-llvmgpu-legalize-ops`link

Test pass for several legalization patterns.

LLVMGPU

-iree-amdgpu-emulate-narrow-typelink

-iree-convert-to-nvvmlink

-iree-convert-to-rocdllink

-iree-llvmgpu-1d-vector-canonicalizationslink

-iree-llvmgpu-assign-constant-ordinalslink

-iree-llvmgpu-cast-address-space-functionlink

-iree-llvmgpu-configure-tensor-layoutslink

-iree-llvmgpu-group-global-loadslink

-iree-llvmgpu-legalize-nd-vectorslink

-iree-llvmgpu-link-executableslink

Optionslink

-iree-llvmgpu-lower-executable-targetlink

Optionslink

-iree-llvmgpu-pack-shared-memory-alloclink

-iree-llvmgpu-prefetch-shared-memorylink

Optionslink

-iree-llvmgpu-select-lowering-strategylink

Optionslink

-iree-llvmgpu-tensorcore-vectorizationlink

-iree-llvmgpu-tile-and-distributelink

-iree-llvmgpu-vector-distributelink

-iree-llvmgpu-vector-flatteninglink

-iree-llvmgpu-vector-loweringlink

-iree-llvmgpu-vector-multi-reduction-loweringlink

-iree-llvmgpu-vector-to-gpulink

-iree-test-llvmgpu-legalize-opslink

`-iree-amdgpu-emulate-narrow-type`link

`-iree-convert-to-nvvm`link

`-iree-convert-to-rocdl`link

`-iree-llvmgpu-1d-vector-canonicalizations`link

`-iree-llvmgpu-assign-constant-ordinals`link

`-iree-llvmgpu-cast-address-space-function`link

`-iree-llvmgpu-configure-tensor-layouts`link

`-iree-llvmgpu-group-global-loads`link

`-iree-llvmgpu-legalize-nd-vectors`link

`-iree-llvmgpu-link-executables`link

`-iree-llvmgpu-lower-executable-target`link

`-iree-llvmgpu-pack-shared-memory-alloc`link

`-iree-llvmgpu-prefetch-shared-memory`link

`-iree-llvmgpu-select-lowering-strategy`link

`-iree-llvmgpu-tensorcore-vectorization`link

`-iree-llvmgpu-tile-and-distribute`link

`-iree-llvmgpu-vector-distribute`link

`-iree-llvmgpu-vector-flattening`link

`-iree-llvmgpu-vector-lowering`link

`-iree-llvmgpu-vector-multi-reduction-lowering`link

`-iree-llvmgpu-vector-to-gpu`link

`-iree-test-llvmgpu-legalize-ops`link