Skip to content

Model development debugginglink

Bringing up new models or diagnosing regressions in existing models written using one of IREE's supported ML frameworks or downstream projects like sharktank can involve debugging up and down the tech stack. Here are some tips to make that process easier.

Helpful build settingslink

Use a debug buildlink

Build with -DCMAKE_BUILD_TYPE=Debug or -DCMAKE_BUILD_TYPE=RelWithDebInfo to include debug information in binaries you build.

Enable assertionslink

Build with -DIREE_ENABLE_ASSERTIONS=ON to ensure that asserts in compiler and runtime code are included in your program binaries. If an assert is missed and the program compiles anyways, the output should not be trusted. The compiler must not crash on valid input programs, so assert failures should be fixed and not worked around.

Note: release builds and some CI jobs may not have asserts enabled!

Run using sanitizers (ASan/TSan/UBSan)link

Building and running using sanitizers can catch memory usage issues (ASan), thread synchronization issues (TSan), and undefined behavior (UBSan).

Helpful compiler and runtime flagslink

VM execution tracinglink

The --trace_execution flag to runtime tools like iree-run-module will print each VM instruction as it is executed. This can help with associating other logs and system behavior with the compiled VM program.

Tensor tracinglink

  • The --iree-flow-trace-dispatch-tensors flag to iree-compile inserts trace markers for all dispatch operation tensor inputs and outputs. This lets you see tensor contents change as the program runs.
  • The --iree-flow-break-dispatch flag to iree-compile inserts breaks after a specified dispatch, allowing early termination of the program and shorter logs when focusing debugging around a specific dispatch

Executable substitutionlink

Executable sources can be dumped, edited, and then loaded back into a program using --iree-hal-dump-executable-sources-to and --iree-hal-substitute-executable-source. This can be used for performace tuning or for debugging (e.g. by replacing a complicated dispatch with a simpler one).

See for examples.

Alternate perspectiveslink

Try using other data typeslink

Nearly all targets support the i32 and f32 data types well, while higher and lower bit depth types and more esoteric types like bf16 and complex may be supported partially or not at all on some targets.

If a program fails to compile or produces incorrect outputs, consider checking if the program works after converting to other data types.


These compiler options automatically convert between several types on import:

  • --iree-input-demote-i64-to-i32
  • --iree-input-demote-f32-to-f16
  • --iree-input-demote-f64-to-f32
  • --iree-input-promote-f16-to-f32
  • --iree-input-promote-bf16-to-f32

If using iree-run-module --input=@path/to/input_values.npy, consider also using .bin binary files instead of .npy numpy files, since IREE supports different types than numpy and signedness information is lost at that level.

Try using other targets / deviceslink

Large parts of IREE's compilation pipelines and runtime libraries are shared between compiler target backends and runtime HAL devices/drivers. If a program works in one configuration but fails in another, that indicates an issue or missing functionality in the failing configuration.

Some configurations also offer unique debugging functionality:

Compiler target Runtime device Notable properties for debugging
vmvx local-sync Easy to step into generated code, limited type support
llvm-cpu local-sync Single-threaded, broad type support
llvm-cpu local-task Multi-threaded, broad type support
vulkan-spirv vulkan Compatible with Renderdoc (docs here)
cuda cuda Compatible with NVIDIA Nsight Graphics
rocm hip Compatible with Omniperf
metal-spirv metal Compatible with the Metal Debugger


See the deployment configurations pages for more information about each backend and device.

Run natively and via Python bindingslink

Some problems manifest only when running through the Python (or some other language/framework) bindings. The Python bindings have some non-trivial interop and memory management across the C/C++/Python boundary.

Try extracting standalone .mlir files, compiling through iree-compile, then running through iree-run-module. Extracting these artifacts can also help other developers follow your reproduction steps.

Reducing complexitylink

Top-down reductionlink

Starting from a full program, try to reduce the program size and complexity while keeping the issue you are debugging present. This can be either a manual process or the iree-reduce tool can automate it. For manual reduction, here are some general strategies:

  • Reduce tensor sizes (e.g. image dimensions, context lengths) in your ML framework
  • Cut out duplicate layers (e.g. attention blocks in LLMs)
  • If your program has multiple functions, test each in isolation

Bottom-up reductionlink

Consider writing unit tests for individual ops or combinations of ops to see if crashes, bugs, numerical issues, etc. can be reproduced at that scale.

Some existing test suites can be found at these locations: