LLVMGPU
-iree-amdgpu-emulate-narrow-typelink
Emulate narrow integer operations including amdgpu operations
-iree-convert-to-nvvmlink
Perform final conversion from builtin/GPU/HAL/standard dialect to LLVM and NVVM dialects
-iree-convert-to-rocdllink
Perform final conversion from builtin/GPU/HAL/standard dialect to LLVM and ROCDL dialects
-iree-llvmgpu-1d-vector-canonicalizationslink
Canonicalization patterns for 1-D vectors after legalization.
-iree-llvmgpu-assign-constant-ordinalslink
Assigns executable constant ordinals across all LLVMGPU variants.
-iree-llvmgpu-cast-address-space-functionlink
Cast address space to generic in CallOp and FuncOp
-iree-llvmgpu-configure-tensor-layoutslink
Pass to set layouts on tensors for later vector distribution
-iree-llvmgpu-group-global-loadslink
Group adjacent global loads to improve GPU instruction scheduling
Moves vector.load and memref.load operations from global-memory memrefs next to each other when they are separated only by operations that do not depend on the preceding load's result. This enables the GPU backend to issue multiple global loads before waiting, instead of serializing each load behind its own waitcount.
-iree-llvmgpu-legalize-nd-vectorslink
Legalize n-D vectors to 1-D vectors using type conversion.
-iree-llvmgpu-link-executableslink
Links LLVMGPU HAL executables within the top-level program module.
Optionslink
-target : Target backend name whose executables will be linked by this pass.
-iree-llvmgpu-lower-executable-targetlink
Perform lowering of executable target using one of the IREE::HAL::DispatchLoweringPassPipeline
Optionslink
-for-rocdl : Enable features only supported on ROCDL such as delaying lowering of subgroup reduce.
-iree-llvmgpu-pack-shared-memory-alloclink
Pass pack shared memory allocation in order to reduce memory usage.
-iree-llvmgpu-prefetch-shared-memorylink
Rotate scf.for loops to prefetch shared memory with distance 1. This pass is only applicableto ROCDL targets because its effectiveness on non-AMD GPUs lacks testing and evaluation.
Optionslink
-num-stages : Number of pipeline stages (1, 2, or 3+)
-iree-llvmgpu-select-lowering-strategylink
Select a IREE::HAL::DispatchLoweringPassPipeline for lowering the target variant
Optionslink
-gpu-options : GPU codegen options consumed by this pass; see GPUCodegenOptions for the available and default settings.
-iree-llvmgpu-tensorcore-vectorizationlink
Pass to convert linalg into Vector and transform it to a form that can be lowered to GPU MMA ops
-iree-llvmgpu-tile-and-distributelink
Pass to tile and distribute linalg ops within a workgroup.
-iree-llvmgpu-vector-distributelink
Pass to distribute vectorized functions.
-iree-llvmgpu-vector-flatteninglink
Flatten n-D vectors.
-iree-llvmgpu-vector-loweringlink
Pass to lower Vector ops before conversion to LLVM.
-iree-llvmgpu-vector-multi-reduction-loweringlink
Lower vector.multi_reduction ops.
-iree-llvmgpu-vector-to-gpulink
Pass to convert vector to gpu.
-iree-test-llvmgpu-legalize-opslink
Test pass for several legalization patterns.