Profiling overviewlink
IREE benchmarking gives us an accurate and reproducible view of program performance at specific levels of granularity. To analyze system behavior in more depth, there are various ways to profile IREE.
Device profiling and replaylink
Device profiling captures HAL-native .ireeprof
bundles from the devices executing a workload. Use it to inspect queue
operations, dispatch timings, memory events, executable metadata, device
metrics, and backend-specific counters or traces.
Device replay captures HAL work into .ireereplay
reproducers that can be run, benchmarked, profiled, and inspected without
re-running the original application. Replay is especially useful when you need
to capture a workload once and repeatedly benchmark or profile the same device
operation stream.
CPU cache and other CPU event profilinglink
For some advanced CPU profiling needs such as querying CPU cache and other events, one may need to use some OS-specific profilers. See Profiling CPUs.
Vulkan GPU Profilinglink
Tracy offers great insights into CPU/GPU interactions and Vulkan API usage details. However, information at a finer granularity, especially inside a particular shader dispatch, is missing. To supplement general purpose tools like Tracy, vendor-specific tools can be used. Refer to Profiling GPUs using Vulkan.
Tracylink
Tracy is a profiler that's been used for a wide range of profiling tasks on IREE. Refer to Profiling with Tracy.