Skip to content

Compile time regression debugginglink

So the IREE compiler used to compile a program quickly, but it is now slower. What do you do?

Initial information gatheringlink

Try to answer as many of these questions as you can:

When did compilation get slower?

A specific git commit is ideal, but "sometime in the last week" is a good starting point. You'll ultimately want to find a culprit release or git commit that changed the compiler code.

How much slower did compilation get?

Be specific - did it jump from 1 minute to 2 minutes, or 1 minute to 1 hour? Identifying the scale of the regression can help set the priority to investigate it.

What is the full compile command?

Try to extract the input program and full list of flags passed to the compiler binary so that others can reproduce what you're seeing. Try to distill this as much as possible to using just native tools (no Python or other framework layers).

What environment is the compiler running in?

Are you using a Debug build, or a release build? What operating system and size machine is running the compiler (e.g. Linux developer machine, or a smaller system)?

Culprit finding and bisectinglink

If you only have a rough idea of when something changed and want to narrow that down to a specific code change, bisecting can help.

Running git bisectlink

Building the compiler from source and using git bisect will let you pinpoint specific commits in IREE, though it typically won't let you step through changes in submodules (e.g. MLIR updates in third_party/llvm-project/).

Tip: Configure ccache if you'll be rebuilding the compiler while bisecting

A manual workflow with git bisect looks like this:

git bisect start --first-parent
git bisect good [<rev>]
git bisect bad [<rev>]

# Read the prompts from the command as it runs
# At each step, test the compiler:
#   git submodule update
#   cmake --build build/ --target iree-compile
#   ./build/tools/iree-compile <args>
#       attach Tracy, observe timing, print IR, etc. to determine if fast or slow
#       if fast, `git bisect good`
#       if slow, `git bisect bad`
#   repeat

An automated workflow can use git bisect run and a script:

# run_bisect.sh
git submodule update
cmake --build build/ --target iree-compile
# Other logic here
git bisect start --first-parent
git bisect good [<rev>]
git bisect bad [<rev>]
git bisect run run_bisect.sh

Sample: compile executable sources individually with a timeoutlink

#!/bin/bash

set -xeuo pipefail

# --------------------------------------------------------------------------- #
# Settings                                                                    #
# --------------------------------------------------------------------------- #

INPUT_FILE_PATH="/path/to/program.mlirbc"
TMP_DIR="../iree-tmp"

declare -a COMPILER_FLAGS=(
  "--iree-input-type=stablehlo"
  "--iree-hal-target-backends=cuda"
  "--iree-cuda-target=sm_80"
)

TIMEOUT_SECONDS_FOR_COMPILING_EACH_SOURCE=10

# --------------------------------------------------------------------------- #
# Utility functions                                                           #
# --------------------------------------------------------------------------- #

# Call to have `git bisect` skip this commit (don't mark as good _or_ bad)
# https://git-scm.com/docs/git-bisect#_bisect_run
skip_on_error() {
  >&2 echo "** Skipping due to error: $1 **"
  exit 125  # Special exit code for `git bisect skip`
}

# --------------------------------------------------------------------------- #
# Main script                                                                 #
# --------------------------------------------------------------------------- #

# Store git version hash, so we can dump artifacts to unique directories later.
GIT_SHA="$(git rev-parse --short HEAD)"

echo "** Building iree-compile at ${GIT_SHA} **"

# The `git bisect` command only checks out a commit, so update submodules.
git submodule update

# Build the compiler. You'll want ccache configured to make this fast!
cmake --build ../iree-build/ --target iree-compile || skip_on_error "CMake build failed"

# Run the compiler, dumping executable sources and stopping.
SOURCES_DIR="${TMP_DIR}/sources-${GIT_SHA}"
echo "** Running iree-compile at ${GIT_SHA}, dumping sources to ${SOURCES_DIR} **"
../iree-build/tools/iree-compile \
    ${INPUT_FILE_PATH} \
    ${COMPILER_FLAGS[@]} \
    --iree-hal-dump-executable-sources-to=${SOURCES_DIR} \
    --compile-to=executable-sources \
    -o /dev/null

# Run the compiler again on each executable individually.
echo "** Running iree-compile at ${GIT_SHA} for each executable source **"
SOURCES=($(ls -1 ${SOURCES_DIR}))
for SOURCE in "${SOURCES[@]}"; do
  echo "  * Compiling: ${SOURCE} *"
  timeout --verbose ${TIMEOUT_SECONDS_FOR_COMPILING_EACH_SOURCE} \
   ../iree-build/tools/iree-compile ${SOURCES_DIR}/${SOURCE} \
    ${COMPILER_FLAGS[@]} \
    --compile-mode=hal-executable \
    -o /dev/null
done

Profiling and tracinglink

If you want to understand why the compiler is fast or slow, or if you want to compare performance in detail between two versions, consider these profiling options.

MLIR pass timinglink

The -mlir-timing flag enables Pass Timing instrumentation. Once the compiler finishes running, this prints a report like

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0203 seconds

   ---Wall Time---  --- Name ---
   0.0047 ( 55.9%)  Canonicalizer
   0.0019 ( 22.2%)  VerifierPass
   0.0016 ( 18.5%)  LLVMLoweringPass
   0.0003 (  3.4%)  CSE
   0.0002 (  1.9%)  (A) DominanceInfo
   0.0084 (100.0%)  Total

This is easy data to collect, especially remotely over SSH, but it might not paint a complete picture and requires waiting for compilation to finish.

Using Tracylink

See our documentation on profiling with Tracy, in particular the section on tracing iree-compile. For compile time regressions, pay particular attention to the compilation phases (Flow/Stream/HAL), how many times TranslateExecutablesPass runs, and if there are outlier passes that take significantly longer to run than others.

Here are some previous analyses for inspiration:

Example slow trace:

slow trace

Example fast trace:

fast trace

Example sampling statistics showing 10s of minutes in LLVM codegen:

slow LLVM codegen

Using perf and pproflink

These linux tools allow for fine-grained profiling. Below we present a list of steps to profile iree-compile and visualize the results as a flame graph.

  1. Compile IREE tools with debug information (line tables at minimum) and frame pointers. You can do that by selecting the RelWithDebInfo build type and adding -fno-omit-frame-pointers to your compiler flags:

    cmake <Your-CMAKE-Flags> \
       -DCMAKE_BUILD_TYPE=RelWithDebInfo \
       -DCMAKE_CXX_FLAGS="-fno-omit-frame-pointer" \
       -DCMAKE_C_FLAGS="-fno-omit-frame-pointer"
    
  2. Set perf event scope/access to the appropriate level with perf_event_paranoid.

    echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid
    
  3. Run iree-compile under the perf profiler and collect profile data. This requires sudo.

    sudo perf record -F 999 -g -- tools/iree-compile <Your-Compile-Arguments>
    sudo chown "$USER:$USER" perf.data
    
  4. Use pprof to process perf.data from the previous step and start a local http server with the visualized profile. See the pprof's README for installation instructions and make sure to build perf_data_converter and add it to your PATH.

    pprof -http ':' perf.data
    

Stepping through compiler IRlink

Debugging an MLIR-based compiler like IREE usually involves reading IR at some point. For compile time regressions, it helps to snapshot the IR at a few key phases and look for differences between fast compilation and slow compilation.

Here is one useful flag combination:

--mlir-disable-threading \
--mlir-elide-elementsattrs-if-larger=8 \
--mlir-print-ir-after=iree-hal-materialize-interfaces