Developer tips and trickslink
The IREE compiler is built using MLIR, so it naturally supports the common MLIR debugging workflows. For areas where IREE differentiates itself, this page lists other helpful tips and tricks.
Setting compiler optionslink
Tools such as iree-compile
take options via command-line flags. Pass --help
to see the full list:
$ iree-compile --help
OVERVIEW: IREE compilation driver
USAGE: iree-compile [options] <input file or '-' for stdin>
OPTIONS:
...
Tip - Options and the Python bindings
If you are using the Python bindings, options can be passed via the
extra_args=["--flag"]
argument:
import iree.compiler as ireec
input_mlir = """
func.func @abs(%input : tensor<f32>) -> (tensor<f32>) {
%result = math.absf %input : tensor<f32>
return %result : tensor<f32>
}"""
compiled_module = ireec.tools.compile_str(
input_mlir,
target_backends=["llvm-cpu"],
extra_args=["--mlir-timing"])
Inspecting .vmfb
fileslink
The IREE compiler generates FlatBuffer files using
the .vmfb
file extension, short for "Virtual Machine FlatBuffer", which can
then be loaded and executed using IREE's runtime.
Info - other output formats
The IREE compiler can output different formats with the `--output-format=
flag:
Flag value | Output |
---|---|
--output-format=vm-bytecode (default) |
VM Bytecode (.vmfb ) files |
--output-format=vm-c |
C source modules |
VM Bytecode files are usable across a range of deployment scenarios, while C source modules provide low level connection points for constrained environments like bare metal platforms.
By default, .vmfb
files can be opened as zip files: (1)
- Setting
--iree-vm-emit-polyglot-zip=false
will disable this feature and decrease file size slightly
$ unzip -d simple_abs_cpu ./simple_abs_cpu.vmfb
Archive: ./simple_abs_cpu.vmfb
extracting: simple_abs_cpu/module.fb
extracting: simple_abs_cpu/abs_dispatch_0_system_elf_x86_64.so
The embedded binary (here an ELF shared object with CPU code) can be parsed by standard tools:
$ readelf -Ws ./simple_abs_cpu/abs_dispatch_0_system_elf_x86_64.so
Symbol table '.dynsym' contains 2 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000001760 17 FUNC GLOBAL DEFAULT 7 iree_hal_executable_library_query
Symbol table '.symtab' contains 42 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS abs_dispatch_0
2: 0000000000001730 34 FUNC LOCAL DEFAULT 7 abs_dispatch_0_generic
3: 00000000000034c0 80 OBJECT LOCAL DEFAULT 8 iree_hal_executable_library_query_v0
4: 0000000000001780 111 FUNC LOCAL DEFAULT 7 iree_h2f_ieee
5: 00000000000017f0 207 FUNC LOCAL DEFAULT 7 iree_f2h_ieee
...
The iree-dump-module
tool can also be used to see information about a given
.vmfb
file:
$ iree-dump-module simple_abs.vmfb
//===---------------------------------------------------------------------===//
// @module : version 0
//===---------------------------------------------------------------------===//
Required Types:
[ 0] i32
[ 1] i64
[ 2] !hal.allocator
[ 3] !hal.buffer
...
Module Dependencies:
hal, version >= 0, required
Imported Functions:
[ 0] hal.allocator.allocate(!vm.ref<?>, i32, i32, i64) -> (!vm.ref<?>)
[ 1] hal.devices.get(i32) -> (!vm.ref<?>)
...
Exported Functions:
[ 0] abs(!vm.ref<?>) -> (!vm.ref<?>)
[ 1] __init() -> ()
...
Dumping executable fileslink
The --iree-hal-dump-executable-*
flags instruct the compiler to save files
related to "executable translation" (code generation for a specific hardware
target) into a directory of your choosing. If you are interested in seeing which
operations in your input program were fused into a compute kernel or what device
code was generated for a given program structure, these flags are a great
starting point.
Flag | Files dumped |
---|---|
iree-hal-dump-executable-files-to |
All files (meta-flag) |
iree-hal-dump-executable-sources-to |
Source .mlir files prior to HAL compilation |
iree-hal-dump-executable-intermediates-to |
Intermediate files (e.g. .o files, .mlir stages) |
iree-hal-dump-executable-binaries-to |
Binary files (e.g. .so , .spv , .ptx ), as used in the .vmfb |
iree-hal-dump-executable-benchmarks-to |
Standalone benchmark files for iree-benchmark-module |
$ mkdir -p /tmp/iree/simple_abs/
$ iree-compile simple_abs.mlir \
--iree-hal-target-backends=llvm-cpu \
--iree-llvmcpu-link-embedded=false \
--iree-hal-dump-executable-files-to=/tmp/iree/simple_abs \
-o /tmp/iree/simple_abs/simple_abs_cpu.vmfb
$ ls /tmp/iree/simple_abs
module_abs_dispatch_0.mlir
module_abs_dispatch_0_system_elf_x86_64_benchmark.mlir
module_abs_dispatch_0_system_elf_x86_64.codegen.ll
module_abs_dispatch_0_system_elf_x86_64.codegen.bc
module_abs_dispatch_0_system_elf_x86_64.linked.ll
module_abs_dispatch_0_system_elf_x86_64.linked.bc
module_abs_dispatch_0_system_elf_x86_64.optimized.ll
module_abs_dispatch_0_system_elf_x86_64.optimized.bc
module_abs_dispatch_0_system_elf_x86_64.o
module_abs_dispatch_0_system_elf_x86_64.s
module_abs_dispatch_0_system_elf_x86_64.so
simple_abs_cpu.vmfb
Tip - Embedded and system linking
The default value of --iree-llvmcpu-link-embedded=true
generates
embedded ELF files. By disabling that flag, the compiler will produce
platform-standard .so
files for Linux, .dll
files for Windows, etc.
While embedded ELF files can be smaller and more portable, inspection of
artifacts is easier with platform-standard shared object files.
Tip - Disassembling .bc
files with llvm-dis
This section can be skipped if the .ll files are already in the directory you choose.
The .bc
intermediate files use the
LLVM BitCode format, which
can be disassembled using
llvm-dis
:
// Build `llvm-dis` from source as needed:
$ cmake --build iree-build/ --target llvm-dis
$ iree-build/llvm-project/bin/llvm-dis --help
$ cd /tmp/iree/simple_abs/
$ llvm-dis module_abs_dispatch_0_system_elf_x86_64.codegen.bc
$ cat module_abs_dispatch_0_system_elf_x86_64.codegen.ll
; ModuleID = 'module_abs_dispatch_0_system_elf_x86_64.codegen.bc'
source_filename = "abs_dispatch_0"
target triple = "x86_64-linux-gnu"
%iree_hal_executable_library_header_t = type { i32, ptr, i32, i32 }
%iree_hal_executable_dispatch_attrs_v0_t = type { i16, i16 }
...
define internal i32 @abs_dispatch_0_generic(
ptr noalias nonnull align 16 %0,
ptr noalias nonnull align 16 %1,
ptr noalias nonnull align 16 %2) #0 {
%4 = load %iree_hal_executable_dispatch_state_v0_t, ptr %1, align 8,
%5 = extractvalue %iree_hal_executable_dispatch_state_v0_t %4, 10,
%6 = load ptr, ptr %5, align 8,
%7 = ptrtoint ptr %6 to i64,
%8 = and i64 %7, 63,
%9 = icmp eq i64 %8, 0,
call void @llvm.assume(i1 %9),
%10 = load %iree_hal_executable_dispatch_state_v0_t, ptr %1, align 8,
%11 = extractvalue %iree_hal_executable_dispatch_state_v0_t %10, 10,
%12 = getelementptr ptr, ptr %11, i32 1,
%13 = load ptr, ptr %12, align 8,
%14 = ptrtoint ptr %13 to i64,
%15 = and i64 %14, 63,
%16 = icmp eq i64 %15, 0,
call void @llvm.assume(i1 %16),
%17 = load float, ptr %6, align 4,
%18 = call float @llvm.fabs.f32(float %17),
store float %18, ptr %13, align 4,
ret i32 0,
}
...
$ mkdir -p /tmp/iree/simple_abs/
$ iree-compile simple_abs.mlir \
--iree-hal-target-backends=vulkan-spirv \
--iree-hal-dump-executable-files-to=/tmp/iree/simple_abs \
-o /tmp/iree/simple_abs/simple_abs_vulkan.vmfb
$ ls /tmp/iree/simple_abs
module_abs_dispatch_0.mlir
module_abs_dispatch_0_vulkan_spirv_fb_benchmark.mlir
module_abs_dispatch_0_vulkan_spirv_fb.mlir
module_abs_dispatch_0_vulkan_spirv_fb.spv
simple_abs_vulkan.vmfb
Tip - Disassembling .spv
files with spirv-dis
The .spv
files use the
SPIR-V binary format, which can
be disassembled using spirv-dis
from
SPIR-V Tools:
$ cd /tmp/iree/simple_abs/
$ spirv-dis module_abs_dispatch_0_vulkan_spirv_fb.spv
; SPIR-V
; Version: 1.0
; Generator: Khronos; 22
; Bound: 20
; Schema: 0
OpCapability Shader
OpExtension "SPV_KHR_storage_buffer_storage_class"
%18 = OpExtInstImport "GLSL.std.450"
OpMemoryModel Logical GLSL450
OpEntryPoint GLCompute %abs_dispatch_0_generic "abs_dispatch_0_generic"
OpExecutionMode %abs_dispatch_0_generic LocalSize 1 1 1
OpName %__resource_var_0_0_ "__resource_var_0_0_"
OpName %__resource_var_0_1_ "__resource_var_0_1_"
OpName %abs_dispatch_0_generic "abs_dispatch_0_generic"
OpDecorate %_arr_float_uint_1 ArrayStride 4
OpMemberDecorate %_struct_2 0 Offset 0
OpDecorate %_struct_2 Block
OpDecorate %__resource_var_0_0_ Binding 0
OpDecorate %__resource_var_0_0_ DescriptorSet 0
OpDecorate %__resource_var_0_1_ Binding 1
OpDecorate %__resource_var_0_1_ DescriptorSet 0
%float = OpTypeFloat 32
%uint = OpTypeInt 32 0
%uint_1 = OpConstant %uint 1
%_arr_float_uint_1 = OpTypeArray %float %uint_1
%_struct_2 = OpTypeStruct %_arr_float_uint_1
%_ptr_StorageBuffer__struct_2 = OpTypePointer StorageBuffer %_struct_2
%__resource_var_0_0_ = OpVariable %_ptr_StorageBuffer__struct_2 StorageBuffer
%__resource_var_0_1_ = OpVariable %_ptr_StorageBuffer__struct_2 StorageBuffer
%void = OpTypeVoid
%9 = OpTypeFunction %void
%uint_0 = OpConstant %uint 0
%_ptr_StorageBuffer_float = OpTypePointer StorageBuffer %float
%abs_dispatch_0_generic = OpFunction %void None %9
%12 = OpLabel
%15 = OpAccessChain %_ptr_StorageBuffer_float %__resource_var_0_0_ %uint_0 %uint_0
%16 = OpLoad %float %15
%17 = OpExtInst %float %18 FAbs %16
%19 = OpAccessChain %_ptr_StorageBuffer_float %__resource_var_0_1_ %uint_0 %uint_0
OpStore %19 %17
OpReturn
OpFunctionEnd
$ mkdir -p /tmp/iree/simple_abs/
$ iree-compile simple_abs.mlir \
--iree-hal-target-backends=cuda \
--iree-hal-dump-executable-files-to=/tmp/iree/simple_abs \
-o /tmp/iree/simple_abs/simple_abs_cuda.vmfb
$ ls /tmp/iree/simple_abs
module_abs_dispatch_0_cuda_nvptx_fb_benchmark.mlir
module_abs_dispatch_0_cuda_nvptx_fb.codegen.ll
module_abs_dispatch_0_cuda_nvptx_fb.codegen.bc
module_abs_dispatch_0_cuda_nvptx_fb.linked.ll
module_abs_dispatch_0_cuda_nvptx_fb.linked.bc
module_abs_dispatch_0_cuda_nvptx_fb.optimized.ll
module_abs_dispatch_0_cuda_nvptx_fb.optimized.bc
module_abs_dispatch_0_cuda_nvptx_fb.ptx
module_abs_dispatch_0.mlir
simple_abs_cuda.vmfb
Tip - Disassembling .bc
files with llvm-dis
This section can be skipped if the .ll files are already in the directory you choose.
The .bc
intermediate files use the
LLVM BitCode format, which
can be disassembled using
llvm-dis
:
// Build `llvm-dis` from source as needed:
$ cmake --build iree-build/ --target llvm-dis
$ iree-build/llvm-project/bin/llvm-dis --help
$ cd /tmp/iree/simple_abs/
$ llvm-dis module_abs_dispatch_0_cuda_nvptx_fb.codegen.bc
$ cat module_abs_dispatch_0_cuda_nvptx_fb.codegen.ll
; ModuleID = 'module_abs_dispatch_0_cuda_nvptx_fb.codegen.bc'
source_filename = "abs_dispatch_0"
declare ptr @malloc(i64)
declare void @free(ptr)
declare float @__nv_fabsf(float)
define void @abs_dispatch_0_generic(ptr noalias readonly align 16 %0, ptr noalias align 16 %1) {
%3 = ptrtoint ptr %0 to i64
%4 = and i64 %3, 63
%5 = icmp eq i64 %4, 0
call void @llvm.assume(i1 %5)
%6 = ptrtoint ptr %1 to i64
%7 = and i64 %6, 63
%8 = icmp eq i64 %7, 0
call void @llvm.assume(i1 %8)
%9 = load float, ptr %0, align 4
%10 = call float @__nv_fabsf(float %9)
store float %10, ptr %1, align 4
ret void
}
!nvvm.annotations = !{!0, !1, !2, !3}
!0 = !{ptr @abs_dispatch_0_generic, !"kernel", i32 1}
!1 = !{ptr @abs_dispatch_0_generic, !"maxntidx", i32 1}
!2 = !{ptr @abs_dispatch_0_generic, !"maxntidy", i32 1}
!3 = !{ptr @abs_dispatch_0_generic, !"maxntidz", i32 1}
Module level executable benchmarkslink
The benchmark files produced by --iree-hal-dump-executable-benchmarks-to
can be compiled in isolation and passed to iree-benchmark-module
, where they
exercise the full IREE runtime for a single executable:
$ iree-compile simple_abs.mlir \
--iree-hal-target-backends=llvm-cpu \
--iree-hal-dump-executable-benchmarks-to=/tmp/iree/simple_abs/ \
-o /dev/null
$ iree-compile \
/tmp/iree/simple_abs/module_abs_dispatch_0_embedded_elf_x86_64_benchmark.mlir \
-o /tmp/iree/simple_abs/module_abs_dispatch_0_benchmark.vmfb
$ iree-benchmark-module \
/tmp/iree/simple_abs/module_abs_dispatch_0_benchmark.vmfb
Low level executable binary benchmarkslink
The binary files produced by --iree-hal-dump-executable-binaries-to
can be passed to iree-benchmark-executable
where they are benchmarked
directly, without using the IREE VM, HAL APIs, task system, etc. Note that this
interface is much lower level and you must specify all push constants / binding
parameters manually:
$ iree-compile \
--iree-hal-target-backends=llvm-cpu \
--iree-hal-dump-executable-binaries-to=/tmp/iree/simple_abs/ \
-o /dev/null
$ iree-benchmark-executable \
--device=local-sync \
--executable_format=embedded-elf-x86_64 \
--executable_file=/tmp/iree/simple_abs/module_abs_dispatch_0_embedded_elf_x86_64.so \
--entry_point=0 \
--binding=f32=-2.5 \
--binding=f32=0 \
--workgroup_count=1,1,1
See the comments in
tools/iree-benchmark-executable-main.c
and the test file at
tools/test/iree-benchmark-executable.mlir
for more information and examples.
Compiling phase by phaselink
IREE compiles programs through a series of broad phases:
graph LR
accTitle: Compilation phases overview
accDescr: Input to ABI to Flow to Stream to HAL to VM
A([Input])
A --> B([ABI])
B --> C([Flow])
C --> D([Stream])
D --> E([HAL])
E --> F([VM])
Tip - available phases
These are the phase names available for use with the --compile-to
and
--compile-from
flags described below:
Phase name | Description |
---|---|
start |
Entry point to the compilation pipeline |
input |
Performs input processing and lowering into core IREE input dialects (linalg/etc) |
abi |
Adjusts the program ABI for the specified execution environment |
preprocessing |
Applies customizable preprocessing prior to FLow/Stream/HAL/VM |
global-optimization |
Performs global program optimization |
dispatch-creation |
Fuses operations and forms dispatch regions |
flow |
Models execution data flow and partitioning using the flow dialect |
stream |
Models execution partitioning and scheduling using the stream dialect |
executable-sources |
Prepares hal dialect executables for translation, prior to codegen |
executable-configurations |
Selects translation strategies for code generation |
executable-targets |
Runs code generation for hal dialect executables |
hal |
Finishes hal dialect processing |
vm |
Lowers to IREE's abstract virtual machine using the vm dialect |
end |
Completes the full compilation pipeline |
For an accurate list of phases, see the source code or check the help output with a command such as:
iree-compile --help | sed -n '/--compile-to/,/--/p' | head -n -1
You can output a program snapshot at intermediate phases with the
--compile-to=<phase name>
flag:
$ cat simple_abs.mlir
func.func @abs(%input : tensor<f32>) -> (tensor<f32>) {
%result = math.absf %input : tensor<f32>
return %result : tensor<f32>
}
$ iree-compile simple_abs.mlir --compile-to=abi
module {
func.func @abs(%arg0: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} {
%0 = hal.tensor.import %arg0 "input 0" : !hal.buffer_view -> tensor<f32>
%1 = math.absf %0 : tensor<f32>
%2 = hal.tensor.export %1 "output 0" : tensor<f32> -> !hal.buffer_view
return %2 : !hal.buffer_view
}
}
This is similar to the --mlir-print-ir-after=
flag, but at clearly defined
pipeline phases.
Compilation can be continued from any intermediate phase. This allows for
interative workflows - compile to a phase, make edits to the .mlir
file,
then resume compilation and continue through the pipeline:
$ iree-compile simple_abs.mlir --compile-to=abi -o simple_abs_abi.mlir
$ sed \
-e 's/math.absf/math.exp/' \
-e 's/@abs/@exp/' \
simple_abs_abi.mlir > simple_exp_abi.mlir
$ iree-compile simple_exp_abi.mlir \
--iree-hal-target-backends=llvm-cpu \
-o simple_exp_cpu.vmfb
or explicitly resume from an intermediate phase with --compile-from=<phase name>
:
$ iree-compile simple_exp_abi.mlir \
--iree-hal-target-backends=llvm-cpu \
--compile-from=abi \
-o simple_exp_cpu.vmfb
Dumping compilation phaseslink
The --dump-compilation-phases-to
flag can be used to dump program IR after
each phase, much like --compile-to
but without exiting early:
$ iree-compile simple_abs.mlir \
--iree-hal-target-backends=llvm-cpu \
--dump-compilation-phases-to=/tmp/iree/simple_abs \
-o /tmp/iree/simple_abs/simple_abs_cpu.vmfb
$ ls /tmp/iree/simple_abs -1v
simple_abs.1.input.mlir
simple_abs.2.abi.mlir
simple_abs.3.preprocessing.mlir
simple_abs.4.global-optimization.mlir
simple_abs.5.dispatch-creation.mlir
simple_abs.6.flow.mlir
simple_abs.7.stream.mlir
simple_abs.8.executable-sources.mlir
simple_abs.9.executable-configurations.mlir
simple_abs.10.executable-targets.mlir
simple_abs.11.hal.mlir
simple_abs.12.vm.mlir
As with --compile-to
, these files can be used together with --compile-from
:
$ iree-compile simple_abs.2.abi.mlir \
--iree-hal-target-backends=llvm-cpu \
--compile-from=abi \
-o simple_exp_cpu.vmfb
All together, these passes can be used to, for example:
- speed up triage ("at which phase do we go wrong")
- allow for faster development iteration (snapshot all phases at some baseline, modify the compiler source, then resume from just before where those changes impact a pipeline)