Tuninglink
This page documents support for IREE dispatch tuning. The compiler supports both default and user-provided tuning specs (specifications) that override compiler heuristics that guide dispatch code generation. In our experience, tuning specs can provide meaningful speedup of model execution. For example, we achieved a ~10% improvement on the Stable Diffusion XL (SDXL) model with the MI300X GPU.
Tuning specslink
The default specs are shipped with the IREE compiler and are target-specific. We aim to provide default tuning specs that cover the most in-demand hardware and dispatches from most popular ML models, although we do not guarantee completeness.
User-provided tuning specs are a mechanism that allows for users to get the best performance on custom models and hardware targets without having to modify the compiler source code or needlessly special-case compiler heuristics.
Currently, the dispatch tuner that generates tuning specs is still experimental and hosted in an external repo. This document describes how to work with tuning specs generated by the SHARK Tuner or produced manually, but it does not go into detail on how to generate these specs.
Flagslink
The use of tuning specs in iree-compile
is controlled with the following
flags:
--iree-codegen-enable-default-tuning-specs
-- enables or disables the default tuning specs shipped with the compiler.--iree-codegen-tuning-spec-path
-- loads a user-specified tuning spec.--iree-codegen-dump-tuning-specs-to
-- dumps final tuning specs to a directory or standard output.
Note that both default and user-provided specs can be enabled at the same time. The compiler will link them together and invoke the user-provided spec before attempting the default one.
Anatomy of a tuning speclink
Examplelink
module @my_spec attributes { transform.with_named_sequence } {
transform.named_sequence @apply_op_config(%op: !transform.any_op {transform.readonly},
%config: !transform.any_param {transform.readonly}) {
transform.annotate %op "compilation_info" = %config : !transform.any_op, !transform.any_param
transform.yield
}
transform.named_sequence
@match_mmt_f16_f16_f32(%root: !transform.any_op {transform.readonly}) -> !transform.any_op {
transform.match.operation_name %root ["linalg.generic"] : !transform.any_op
%ins, %outs = transform.iree.match.cast_compatible_dag_from_root %root {
^bb0(%lhs: tensor<?x?xf16>, %rhs: tensor<?x?xf16>, %out: tensor<?x?xf32>):
%7 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>,
affine_map<(d0, d1, d2) -> (d1, d2)>,
affine_map<(d0, d1, d2) -> (d0, d1)>],
iterator_types = ["parallel", "parallel", "reduction"]}
ins(%lhs, %rhs : tensor<?x?xf16>, tensor<?x?xf16>) outs(%out : tensor<?x?xf32>) {
^bb0(%in: f16, %in_0: f16, %acc: f32):
%8 = arith.extf %in : f16 to f32
%9 = arith.extf %in_0 : f16 to f32
%10 = arith.mulf %8, %9 : f32
%11 = arith.addf %acc, %10 : f32
linalg.yield %11 : f32
} -> tensor<?x?xf32>
} : (!transform.any_op) -> (!transform.any_value, !transform.any_value)
transform.yield %root : !transform.any_op
}
transform.named_sequence
@match_mmt_2048x1280x5120_f16_f16_f32(%matmul: !transform.any_op {transform.readonly})
-> (!transform.any_op, !transform.any_param) {
%mmt = transform.include @match_mmt_f16_f16_f32 failures(propagate) (%matmul)
: (!transform.any_op) -> !transform.any_op
%lhs = transform.get_operand %matmul[0] : (!transform.any_op) -> !transform.any_value
%rhs = transform.get_operand %matmul[1] : (!transform.any_op) -> !transform.any_value
transform.iree.match.cast_compatible_type %lhs = tensor<2048x5120xf16> : !transform.any_value
transform.iree.match.cast_compatible_type %rhs = tensor<1280x5120xf16> : !transform.any_value
%config = transform.param.constant #iree_codegen.compilation_info<
lowering_config = #iree_gpu.lowering_config<{promote_operands = [0, 1],
mma_kind = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>,
subgroup_m_count = 2, subgroup_n_count = 2,
reduction = [0, 0, 64],
workgroup = [64, 128, 0]}>,
translation_info = #iree_codegen.translation_info<pipeline = LLVMGPUVectorDistribute
workgroup_size = [256, 1, 1] subgroup_size = 64,
{gpu_pipeline_options = #iree_gpu.pipeline_options<prefetch_shared_memory = true>}>
> -> !transform.any_param
transform.yield %matmul, %config : !transform.any_op, !transform.any_param
}
transform.named_sequence
@__kernel_config(%variant_op: !transform.any_op {transform.consumed}) -> !transform.any_op
attributes { iree_codegen.tuning_spec_entrypoint } {
%res = transform.foreach_match in %variant_op
@match_mmt_2048x1280x5120_f16_f16_f32 -> @apply_op_config
: (!transform.any_op) -> !transform.any_op
transform.yield %res : !transform.any_op
}
}
Explanationlink
Tuning specs are transform dialect libraries that conform to the following format:
- All tuning spec entry points (named sequence ops) are marked with the
iree_codegen.tuning_spec_entrypoint
attribute. They have a single argument of type!transform.any_op
and return a single value of type!transform.any_op
. - All entry points in the final tuning specs must either read
(
transform.readonly
) or consume (transform.consumed
) the argument.
The tuning spec above attempts to match linalg.generic
ops that correspond to the
matmul operation with the RHS operand transposed (a.k.a. mmt) of shape
2048x1280x5120
and f16
operand element types and f32
result element type.
If the match succeeds, the tuning spec applies the compilation_info
attribute
that will drive the code generation. This attribute is considered a compiler
implementation detail; in general, each codegen pipeline has its own
requirements as to what is considered a valid compilation info and how to
interpret it.
Tuning specs get executed by the 'Materialize User Configs` pass.