Profiling with Tracylink
Overviewlink
Tracy is a hybrid instrumentation and sampling profiler that IREE uses for performance analysis.
Instrumentation and samplinglink
-
Instrumentation is generic code built into the program being profiled, recording zone start and end timestamps where a developer requests them:
Most of IREE's runtime code is instrumented using the macros defined in iree/base/tracing.h:
void iree_sample_function() { IREE_TRACE_ZONE_BEGIN(z0); // All code here will be included in the zone for `iree_sample_function`. IREE_TRACE_ZONE_END(z0); }
-
Sampling collects program state and information about the machine using platform-specific APIs at a regular sampling frequency. Sampled data includes callstacks, hardware counters, and more:
While recording instrumentation data requires no special setup, recording sampling data will need some configuration depending on your operating system. Refer to the "Automated data collection" section in the Tracy PDF manual for full details. Generally, sampling needs:
- Debug information from
-DCMAKE_BUILD_TYPE=RelWithDebInfo
orDebug
- Privilege elevation from
sudo
on Unix or adminstrator on Windows
- Debug information from
Remote or embedded telemetrylink
Tracy uses a client-server model with communication over a TCP socket:
- The "client" is the program being profiled.
- The "server" is either the Tracy profiler UI or the Tracy command-line capture tool.
graph LR
tracyclient["Tracy Client
e.g. iree-run-module"]
tracyserver["Tracy Server"]
network(["Network"])
thread1["Thread 1"] --> tracyclient
thread2["Thread 2"] --> tracyclient
thread3["Thread 3"] --> tracyclient
tracyclient --> network
network --> tracyserver
tracyserver --> display["Display"]
tracyserver --> storage["Storage"]
This allows for remote capture, such as over SSH, as well as sharing of saved traces across machines.
The Tracy manuallink
The primary source of Tracy documentation, including how to build the profiler UI and CLI capture tool, is a PDF manual:
Download tracy.pdf View tracy.pdf in browser
Capturing a tracelink
You will need three things to capture a trace:
- The Tracy profiler UI (or CLI capture tool)
- A binary tool to trace, such as
iree-run-module
, built with tracing support enabled - A program to profile, e.g. a
.vmfb
file with parameters and input values
The Tracy tools can either be downloaded from the official releases or they can be built from source by using either the upstream CMake build or IREE's downstream CMake build.
Quickstartlink
-
Build
iree-run-module
(or other tools likeiree-benchmark-module
) with tracing support:# Sampling needs debug info from the `RelWithDebInfo` or `Debug` build type. cmake -G Ninja -B ../iree-build/ -S . \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DIREE_ENABLE_RUNTIME_TRACING=ON cmake --build ../iree-build/ --target iree-run-module
For more information about building from source, follow the Getting started page.
Tip - Instrumented Python packages
The
iree-base-runtime
Python package includes prebuilt instrumented tools. Set theIREE_PY_RUNTIME=tracy
environment variable to use them:python -m pip install iree-base-runtime IREE_PY_RUNTIME=tracy iree-run-module ...
You should see the following message printed to stderr:
-- Using Tracy runtime (IREE_PY_RUNTIME=tracy)
See this section in the Python bindings documentation for more details.
-
Compile a program to profile:
# The --iree-hal-executable-debug-level=3 flag embeds source information # about each executable into the .vmfb file for the runtime to pass to # Tracy. Without this flag, source locations are included on a best-effort # basis, typically coming from the input .mlir or .py file. iree-compile program_input.mlir \ --iree-hal-target-backends={target} \ --iree-hal-executable-debug-level=3 \ -o program.vmfb
-
Run the program using the instrumented
iree-run-module
:# Set the TRACY_NO_EXIT environment variable to keep short-running programs # from exiting before connecting. # # Some platforms need elevated permissions (root / sudo / administrator) # to collect sampling data using kernel facilities. If you only want to # collect instrumentation data or your platform does not require it, you # can run with more limited permissions. TRACY_NO_EXIT=1 sudo iree-run-module \ --module=program.vmfb \ --device={device} \ --entry_function={entry} \ --parameters={parameters} \ --input={arg0} \ --input={arg1} \ ...
-
While the program is running, connect using the Tracy profiler UI or capture tool:
The profiler UI lists available clients or can be set to connect to the next instrumented process:
The capture tool can be used programmatically and over SSH:
$ capture -o /tmp/capture.tracy Connecting to 127.0.0.1:8086...
-
View the captured trace once it finishes collecting events. Traces captured by the profiler UI can also be saved to
.tracy
files for sharing and archival.
Including more information in traceslink
Changing IREE_TRACING_MODE
link
Set IREE's IREE_TRACING_MODE
value (defined in
iree/base/tracing.h)
to adjust which tracing features are enabled. Each feature adds tracing overhead
and increases the size of trace files, so adjust this setting with care.
For example, to track memory allocations with callstacks:
cmake -G Ninja -B ../iree-build/ -S . \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DIREE_ENABLE_RUNTIME_TRACING=ON \
-DIREE_TRACING_MODE=4
cmake --build ../iree-build/ --target iree-run-module
The Memory window in the Tracy profiler should then show callstacks for each allocation:
Options for the llvm-cpu
backendlink
When using the llvm-cpu
backend (--iree-hal-target-backends=llvm-cpu
with
--device=local-task
or --device=local-sync
), these options are available:
-
The
--iree-llvmcpu-link-embedded=false
flag uses the "system" linker (.so/.dylib/.dll) instead of the generic "embedded" ELF linker, allowing Tracy to look more deeply at generated code: -
The
IREE_PRESERVE_DYLIB_TEMP_FILES
environment variable can be used on Posix platforms to ensure that Tracy can view IREE's generated native code. -
Ensure that
--iree-llvmcpu-debug-symbols=true
is set (it is by default).
Putting those flags and environment variables together in an example:
iree-compile program_input.mlir \
--iree-hal-target-backends=llvm-cpu \
--iree-hal-executable-debug-level=3 \
--iree-llvmcpu-link-embedded=false \
--iree-llvmcpu-debug-symbols=true \
-o program_full_info.vmfb
TRACY_NO_EXIT=1 IREE_PRESERVE_DYLIB_TEMP_FILES=1 sudo iree-run-module \
--device=local-task \
--module=program_full_info.vmfb \
...
Remote capture (e.g. SSH, Android)link
Tracy's client/server connection uses TCP port 8086 by default. If the Tracy-instrumented program is running on a separate machine, this port needs to be forwarded.
In particular, when profiling on Android, this is needed:
adb forward tcp:8086 tcp:8086
You can also pass -p <port>
to the capture tool to override the default port
to connect to, or use the Tracy GUI which scans other ports too.
Touring the Tracy profiler UIlink
The initial view should look like this:
Before going further, take a second to check that your recorded profile data has all the data that it should have. Permissions issues could cause it to lack "sampling" or "CPU data" information. For example, here is what he initial view looks like when one forgot to run the profiled program as root on Desktop Linux (where running as root is required):
Notice how the latter screenshot is lacking the following elements:
- No 'CPU data' header in the top left, with the list of all CPU cores.
- No 'ghost' icon next to the 'Main thread' header.
Click the 'Statistics' button at the top. It will open a window like this:
See how the above screenshot has two radio buttons at the top: 'Instrumentation' and 'Sampling'. At this point, if you don't see the 'Sampling' radio button, you need to resolve that first, as discussed above about possible permissions issues.
These 'Instrumentation' and 'Sampling' statistics correspond the two kinds of data that Tracy collects about your program. In the Tracy main view, they correspond, respectively, to 'instrumentation' and 'ghost' zones. Refer to the Tracy PDF manual for a general introduction to these concepts. For each thread, the ghost icon toggles the view between these two kinds of zones.
Back to the main view, look for the part of the timeline that is of interest to you. Your area of interest might not be on the Main thread. In fact, it might be on a thread that's not visible in the initial view at all. To pan around with the mouse, hold the right mouse button down (or its keyboard equivalent on macOS). Alternatively, look for the 'Frame' control at the top of the Tracy window. Use the 'next frame' arrow button until more interesting threads appear.
IREE module code tends to run on a thread whose name contains the word worker
.
Once you have identified the thread of interest, you typically want to click its ghost icon to view its "ghost" (i.e. sampling) zones. Here is what you should get when clicking on a ghost zone:
The percentages column to the left of the disassembly shows where time is being spent. This is unique to the sampling data (ghost zones) and has no equivalent in the instrumentation data (instrumentation zones). Here is what we get clicking on the corresponding instrumentation zone:
This still has a 'Source' button but that only shows the last C++ caller that
had explicit Tracy information, so here we see a file under iree/hal
whereas
the Ghost zone saw into the IREE compiled module that that calls into, with the
source view pointing to the .mlir
file.
Tracing iree-compile
link
Tracing iree-compile
is much like tracing the runtime tools, except that
both of these options need to be set with CMake:
-DIREE_ENABLE_RUNTIME_TRACING=ON -DIREE_ENABLE_COMPILER_TRACING=ON
:
cmake -G Ninja -B ../iree-build/ -S . \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DIREE_ENABLE_RUNTIME_TRACING=ON \
-DIREE_ENABLE_COMPILER_TRACING=ON
cmake --build ../iree-build/ --target iree-compile
The steps for collecting traces are the same: run the instrumented program and connect using the Tracy profiler UI or capture tool.
- MLIR passes are instrumented using
Pass Instrumentation,
(see
TracingUtils.h
) - Zones are annotated with op breadcrumbs indicating which root op was processed
- Each compilation phase (e.g. Flow, Stream, HAL) is tagged as a "frame", so you can jump between them, limit statistics to them, and see how much time each took
Caution - Tracy sampling with iree-compile
When tracing the compiler, the LLVM/MLIR code can easily generate millions of trace events. Traces captured with sampling can thus take hours to collect, require 40GB+ of RAM to view, and take 1GB+ on disk to store.
However, sampling is especially useful in diagnosing long compile times, since only the MLIR passes are instrumentated, unlike in IREE's runtime where most functions are covered.
For more tips on profiling the compiler, see the Compile time regression debugging page.
Troubleshootinglink
"RESOURCE_EXHAUSTED; failed to open file" issuelink
This is a known issue with how tracy operates. One way to workaround it is to manually increase the total number of files that can be kept opened simultaneously and run the command with that setting:
sudo sh -c "ulimit -n <bigNum> && <myTracyInstrumentedProgram>"
Info
Tracy keeps a number of file descriptors open that, depending on the
machine and its settings, may exceed the limit allowed by the system
resulting in IREE failing to open more files. In particular, it is commom
to have a relatively low limit when running with sudo
.
Appendixlink
Building Tracy from sourcelink
First, refer to the upstream build instructions at either the https://github.com/wolfpld/tracy/ repository itself or the Tracy PDF manual.
For example, to build the profiler GUI from an IREE checkout:
# Build using CMake:
cd third_party/tracy
cmake -B profiler/build -S profiler -DCMAKE_BUILD_TYPE=Release
cmake --build profiler/build --parallel --config Release
# Now launch the profiler:
./profiler/build/tracy-profiler
Additional Capstone dependencies for CPU code disassemblylink
You can skip this section if you don't need disassembly of CPU code.
Capstone is the disassembly framework used by Tracy. The default branch, which is what OS packages still distribute, is running a few years behind current CPU architectures.
Newer CPU architectures such as RISC-V, or newer extensions of existing
architectures (e.g. new SIMD instructions in the ARM architecture) are typically
only supported in the
next
branch. If you
need that support, check out and build that branch. Consider uninstalling any OS
package for capstone
or otherwise ensure that your IREE build will pick up
your next
branch build.
Linuxlink
If you haven't opted to build capstone-next
(see above section), install the
OS package for capstone
now (Debian-based distributions):
sudo apt install libcapstone-dev
Install other dependencies:
sudo apt install libtbb-dev libzstd-dev libglfw3-dev libfreetype6-dev libgtk-3-dev
If you only build the command-line tool iree-tracy-capture
and not the
graphical iree-tracy-profiler
, you can install only:
sudo apt install libtbb-dev libzstd-dev
The zstd version on Ubuntu 18.04 is old. You will need to install it from source from https://github.com/facebook/zstd.git
Maclink
If you haven't opted to build capstone-next
(see above section), install the
system capstone
now:
brew install capstone
Install other dependencies:
brew install pkg-config glfw freetype tbb zstd
Android system settings required for Sampling and SysTracelink
When profiling on an Android device, in order to get the most useful information in the trace, tweak system permissions as follows before profiling. This needs to be done again after every reboot of the Android device.
From your desktop, get a shell on the Android device:
adb shell
The following commands are meant to be run from that Android device shell. First, get root access:
su
Now run the following commands as root on the Android device:
setenforce 0
mount -o remount,hidepid=0 /proc
echo 0 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict
Note: in order for this to work, the device needs to be rooted, which means
that the above su
command must succeed. This is sometimes confused with the
adb root
command, but that's not the same. adb root
restarts the adbd
daemon as root, which causes device shells to be root shells by default. This is
unnecessary here and we don't recommend it: real Android applications never
run as root, so Tracy/Android has to support running benchmarks as regular
user and it's best to stick to this for the sake of realistic benchmarks.
Internally, Tracy executes su
commands to perform certain actions, so it too
relies on the device being rooted without relying on the benchmark process
being run as root.