Skip to content

LLVMGPU

-iree-amdgpu-emulate-narrow-typelink

Emulate narrow integer operations including amdgpu operations

-iree-convert-to-nvvmlink

Perform final conversion from builtin/GPU/HAL/standard dialect to LLVM and NVVM dialects

-iree-convert-to-rocdllink

Perform final conversion from builtin/GPU/HAL/standard dialect to LLVM and ROCDL dialects

-iree-llvmgpu-1d-vector-canonicalizationslink

Canonicalization patterns for 1-D vectors after legalization.

-iree-llvmgpu-assign-constant-ordinalslink

Assigns executable constant ordinals across all LLVMGPU variants.

-iree-llvmgpu-cast-address-space-functionlink

Cast address space to generic in CallOp and FuncOp

-iree-llvmgpu-configure-tensor-layoutslink

Pass to set layouts on tensors for later vector distribution

-iree-llvmgpu-group-global-loadslink

Group adjacent global loads to improve GPU instruction scheduling

Moves vector.load and memref.load operations from global-memory memrefs next to each other when they are separated only by operations that do not depend on the preceding load's result. This enables the GPU backend to issue multiple global loads before waiting, instead of serializing each load behind its own waitcount.

-iree-llvmgpu-legalize-nd-vectorslink

Legalize n-D vectors to 1-D vectors using type conversion.

Links LLVMGPU HAL executables within the top-level program module.

Optionslink

-target : Target backend name whose executables will be linked by this pass.

-iree-llvmgpu-lower-executable-targetlink

Perform lowering of executable target using one of the IREE::HAL::DispatchLoweringPassPipeline

Optionslink

-for-rocdl : Enable features only supported on ROCDL such as delaying lowering of subgroup reduce.

-iree-llvmgpu-pack-shared-memory-alloclink

Pass pack shared memory allocation in order to reduce memory usage.

-iree-llvmgpu-prefetch-shared-memorylink

Rotate scf.for loops to prefetch shared memory with distance 1. This pass is only applicableto ROCDL targets because its effectiveness on non-AMD GPUs lacks testing and evaluation.

Optionslink

-num-stages : Number of pipeline stages (1, 2, or 3+)

-iree-llvmgpu-select-lowering-strategylink

Select a IREE::HAL::DispatchLoweringPassPipeline for lowering the target variant

Optionslink

-gpu-options : GPU codegen options consumed by this pass; see GPUCodegenOptions for the available and default settings.

-iree-llvmgpu-tensorcore-vectorizationlink

Pass to convert linalg into Vector and transform it to a form that can be lowered to GPU MMA ops

-iree-llvmgpu-tile-and-distributelink

Pass to tile and distribute linalg ops within a workgroup.

-iree-llvmgpu-vector-distributelink

Pass to distribute vectorized functions.

-iree-llvmgpu-vector-flatteninglink

Flatten n-D vectors.

-iree-llvmgpu-vector-loweringlink

Pass to lower Vector ops before conversion to LLVM.

-iree-llvmgpu-vector-multi-reduction-loweringlink

Lower vector.multi_reduction ops.

-iree-llvmgpu-vector-to-gpulink

Pass to convert vector to gpu.

-iree-test-llvmgpu-legalize-opslink

Test pass for several legalization patterns.