CodegenDialectPCF
-iree-pcf-convert-forall-to-loopslink
Converts scf.forall ops to pcf.loop
Test pass for converting scf.forall ops without mapping attributes to
pcf.loop ops with sequential scope.
The input is IR containing scf.forall ops with tensor results and
tensor.parallel_insert_slice terminators. Only forall ops without
mapping attributes are converted.
The output replaces each matching scf.forall with a pcf.loop:
- Iteration bounds come from the forall's upper/lower bounds and steps
- tensor.parallel_insert_slice ops become pcf.write_slice ops
- Shared output tensors become tied pcf.sref region arguments
- The scope is set to #pcf.sequential for sequential execution
The underlying conversion pattern is exposed separately via
convertForallToPCF() with callbacks for mapping processor IDs to custom
execution scopes.
-iree-pcf-fuse-consumerslink
Fuses all consumers of pcf.generic/loop ops
Test pass for fusing consumer operations into pcf.generic and pcf.loop
ops.
The input is IR containing PCF parallel ops with external consumers that
implement TilingInterface or are tensor.extract_slice ops.
The pass greedily fuses each tilable consumer into the producer PCF op by
cloning the consumer into the PCF region, tiling it to match the iteration
space, and replacing the external use with a new pcf.write_slice.
Supported fusion scenarios:
- Multiple pcf.write_slice producers for a single consumer value
- Fusion along multiple operands with a single pcf.write_slice per operand
- tensor.extract_slice by adding a condition based on the slice bounds.
The underlying fusion patterns are exposed via matchTilableConsumer() and
fuseTilableConsumer() for use in custom pipelines.
-iree-pcf-fuse-pcf-writeslink
Consolidates pcf.write_slice ops in loop bodies
Test pass for composing pcf.write_slice operations with nested
scf.forall producers.
The input is IR containing pcf.write_slice ops where the source value is
produced by an scf.forall with tensor.parallel_insert_slice in its
terminator.
The pass moves each matching pcf.write_slice inside the scf.forall
body, composing the slice parameters:
- New offsets: write_offset + insert_offset * write_stride
- New sizes: from the parallel_insert_slice
- New strides: write_stride * insert_stride
This enables further lowering by ensuring writes happen at the granularity of the inner parallel loop rather than after the entire forall completes.
The underlying pattern is exposed via composeWriteSliceWithParallelInsert()
for use in custom pipelines.
-iree-pcf-lower-structural-pcflink
Lowers pcf.generic/loop to scf.execute_region/forall
Lowers structured PCF parallel ops to SCF and CF ops. This is the final step of PCF lowering, converting the abstract parallel constructs to concrete control flow that can be further lowered to target-specific code.
The input is expected to be IR where PCF ops no longer have tied results
(after iree-pcf-convert-sref-to-memref). The scope attribute on each op
determines how worker IDs and counts are materialized.
Op conversions:
- pcf.generic -> scf.execute_region with worker IDs from scope
- pcf.loop -> serialized scf.forall with iteration bounds from count operands
- pcf.return -> scf.yield (in generic) or scf.forall.in_parallel (in loop)
- pcf.branch_cond_return -> cf.cond_br to a return block
If sync_on_return is set on a parallel op, a barrier (determined by the
scope) is inserted after the lowered op. This attribute is introduced when
converting sref to memref to retain the required synchronization semantics
of any tied results.