CPU deploymentlink
IREE supports efficient program execution on CPU devices by using LLVM to compile all dense computations in each program into highly optimized CPU native instruction streams, which are embedded in one of IREE's deployable formats.
To compile a program for CPU execution, pick one of IREE's supported executable formats:
Executable Format | Description |
---|---|
embedded ELF | portable, high performance dynamic library |
system library | platform-specific dynamic library (.so, .dll, etc.) |
VMVX | reference target |
At runtime, CPU executables can be loaded using one of IREE's CPU HAL drivers:
local-task
: asynchronous, multithreaded driver built on IREE's "task" systemlocal-sync
: synchronous, single-threaded driver that executes work inline
Todo
Add IREE's CPU support matrix: what architectures are supported; what architectures are well optimized; etc.
Prerequisiteslink
Get the IREE compilerlink
Download the compiler from a releaselink
Python packages are regularly published to
PyPI. See the
Python Bindings page for more details.
The core iree-base-compiler
package includes the LLVM-based CPU compiler:
Stable release packages are published to PyPI.
python -m pip install iree-base-compiler
Nightly pre-releases are published on GitHub releases.
python -m pip install \
--find-links https://iree.dev/pip-release-links.html \
--upgrade --pre iree-base-compiler
Tip
iree-compile
and other tools are installed to your python module
installation path. If you pip install with the user mode, it is under
${HOME}/.local/bin
, or %APPDATA%Python
on Windows. You may want to
include the path in your system's PATH
environment variable:
export PATH=${HOME}/.local/bin:${PATH}
Build the compiler from sourcelink
Please make sure you have followed the
Getting started page to build
IREE for your host platform and the
Android cross-compilation or
iOS cross-compilation page if you are cross
compiling for a mobile device. The llvm-cpu
compiler backend is compiled in by
default on all platforms.
Ensure that the IREE_TARGET_BACKEND_LLVM_CPU
CMake option is ON
when
configuring for the host.
Tip
iree-compile
will be built under the iree-build/tools/
directory. You
may want to include this path in your system's PATH
environment variable.
Get the IREE runtimelink
You will need to get an IREE runtime that supports the local CPU HAL driver, along with the appropriate executable loaders for your application.
You can check for CPU support by looking for the local-sync
and local-task
drivers:
$ iree-run-module --list_drivers
cuda: NVIDIA CUDA HAL driver (via dylib)
hip: HIP HAL driver (via dylib)
local-sync: Local execution using a lightweight inline synchronous queue
local-task: Local execution using the IREE multithreading task system
vulkan: Vulkan 1.x (dynamic)
Download the runtime from a releaselink
Python packages are regularly published to
PyPI. See the
Python Bindings page for more details.
The core iree-base-runtime
package includes the local CPU HAL drivers:
Stable release packages are published to PyPI.
python -m pip install iree-base-runtime
Nightly pre-releases are published on GitHub releases.
python -m pip install \
--find-links https://iree.dev/pip-release-links.html \
--upgrade --pre iree-base-runtime
Build the runtime from sourcelink
Please make sure you have followed the Getting started page to build IREE for your host platform and the Android cross-compilation page if you are cross compiling for Android. The local CPU HAL drivers are compiled in by default on all platforms.
Ensure that the IREE_HAL_DRIVER_LOCAL_TASK
and
IREE_HAL_EXECUTABLE_LOADER_EMBEDDED_ELF
(or other executable loader) CMake
options are ON
when configuring for the target.
Compile and run a programlink
With the requirements out of the way, we can now compile a model and run it.
Compile a programlink
The IREE compiler transforms a model into its final deployable format in many sequential steps. A model authored with Python in an ML framework should use the corresponding framework's import tool to convert into a format (i.e., MLIR) expected by the IREE compiler first.
Using MobileNet v2 as an example, you can download the SavedModel with trained
weights from
TensorFlow Hub
and convert it using IREE's
TensorFlow importer. Then run the following
command to compile with the llvm-cpu
target:
iree-compile \
--iree-hal-target-backends=llvm-cpu \
mobilenet_iree_input.mlir -o mobilenet_cpu.vmfb
Tip - CPU targets
The --iree-llvmcpu-target-triple
flag tells the compiler to generate code
for a specific type of CPU. You can see the list of supported targets with
iree-compile --iree-llvmcpu-list-targets
, or pass "host" to let LLVM
infer the triple from your host machine (e.g. x86_64-linux-gnu
).
$ iree-compile --iree-llvmcpu-list-targets
Registered Targets:
aarch64 - AArch64 (little endian)
aarch64_32 - AArch64 (little endian ILP32)
aarch64_be - AArch64 (big endian)
arm - ARM
arm64 - ARM64 (little endian)
arm64_32 - ARM64 (little endian ILP32)
armeb - ARM (big endian)
riscv32 - 32-bit RISC-V
riscv64 - 64-bit RISC-V
wasm32 - WebAssembly 32-bit
wasm64 - WebAssembly 64-bit
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
Tip - CPU features
The --iree-llvmcpu-target-cpu-features
flag tells the compiler to generate
code using certain CPU "features", like SIMD instruction sets. Like the
target triple, you can pass "host" to this flag to let LLVM infer the
features supported by your host machine.
Run a compiled programlink
In the build directory, run the following command:
tools/iree-run-module \
--device=local-task \
--module=mobilenet_cpu.vmfb \
--function=predict \
--input="1x224x224x3xf32=0"
The above assumes the exported function in the model is named as predict
and
it expects one 224x224 RGB image. We are feeding in an image with all 0 values
here for brevity, see iree-run-module --help
for the format to specify
concrete values.