CodegenDialectPCF
-iree-pcf-convert-forall-to-loopslink
Converts scf.forall ops to pcf.loop
Test pass for converting scf.forall ops without mapping attributes to
pcf.loop ops with sequential scope.
The input is IR containing scf.forall ops with tensor results and
tensor.parallel_insert_slice terminators. Only forall ops without
mapping attributes are converted.
The output replaces each matching scf.forall with a pcf.loop:
- Iteration bounds come from the forall's upper/lower bounds and steps
- tensor.parallel_insert_slice ops become pcf.write_slice ops
- Shared output tensors become tied pcf.sref region arguments
- The scope is set to #pcf.sequential for sequential execution
The underlying conversion pattern is exposed separately via
convertForallToPCF() with callbacks for mapping processor IDs to custom
execution scopes.
-iree-pcf-convert-sref-to-memreflink
Converts pcf.sref types to concrete memref types
Converts pcf.sref (shaped reference) types to memref types with
appropriate layouts and memory spaces derived from the scope attribute.
After this pass, there are no remaining results tied to PCF parallel ops.
All pcf.sref types are converted to concrete memref types, preparing
the IR for structural lowering of PCF ops to SCF/CF.
The input is expected to be bufferized IR where tied inputs to parallel
ops are already memref types. Layouts are derived from these parallel op
tied inputs and propagated via dataflow analysis.
Op conversions:
- pcf.alloc -> memref.alloc
- pcf.write_slice -> memref.subview + memref.copy or vector.transfer_write
- pcf.read_slice -> memref.subview + vector.transfer_read
- pcf.generic/pcf.loop region arguments become the tied memref inputs
-iree-pcf-fuse-consumerslink
Fuses all consumers of pcf.generic/loop ops
Test pass for fusing consumer operations into pcf.generic and pcf.loop
ops.
The input is IR containing PCF parallel ops with external consumers that
implement TilingInterface or are tensor.extract_slice ops.
The pass greedily fuses each tilable consumer into the producer PCF op by
cloning the consumer into the PCF region, tiling it to match the iteration
space, and replacing the external use with a new pcf.write_slice.
Supported fusion scenarios:
- Multiple pcf.write_slice producers for a single consumer value
- Fusion along multiple operands with a single pcf.write_slice per operand
- tensor.extract_slice by adding a condition based on the slice bounds.
The underlying fusion patterns are exposed via matchTilableConsumer() and
fuseTilableConsumer() for use in custom pipelines.
-iree-pcf-fuse-pcf-writeslink
Consolidates pcf.write_slice ops in loop bodies
Test pass for composing pcf.write_slice operations with nested
scf.forall producers.
The input is IR containing pcf.write_slice ops where the source value is
produced by an scf.forall with tensor.parallel_insert_slice in its
terminator.
The pass moves each matching pcf.write_slice inside the scf.forall
body, composing the slice parameters:
- New offsets: write_offset + insert_offset * write_stride
- New sizes: from the parallel_insert_slice
- New strides: write_stride * insert_stride
This enables further lowering by ensuring writes happen at the granularity of the inner parallel loop rather than after the entire forall completes.
The underlying pattern is exposed via composeWriteSliceWithParallelInsert()
for use in custom pipelines.
-iree-pcf-fuse-producerslink
Fuses DPS producers into pcf.generic/loop ops through tied init args.
Pass for fusing producer operations into pcf.generic and pcf.loop ops.
The input is IR containing PCF parallel ops whose tied init values are
produced by operations implementing TilingInterface and
DestinationStyleOpInterface (e.g. linalg.fill).
The pass matches each tied init that is the single result of a DPS producer.
For each pcf.read_slice on the corresponding sref argument, it generates a
tiled version of the producer via generateResultTileValue and replaces the
read with the tiled result. The scoped op's init is updated to the producer's
DPS init, and the original producer is erased if unused.
The underlying fusion patterns are exposed via matchTilableProducer() and
fuseTilableProducer() for use in custom pipelines.
-iree-pcf-lower-structural-pcflink
Lowers pcf.generic/loop to scf.execute_region/forall
Lowers structured PCF parallel ops to SCF and CF ops. This is the final step of PCF lowering, converting the abstract parallel constructs to concrete control flow that can be further lowered to target-specific code.
The input is expected to be IR where PCF ops no longer have tied results
(after iree-pcf-convert-sref-to-memref). The scope attribute on each op
determines how worker IDs and counts are materialized.
Op conversions:
- pcf.generic -> scf.execute_region with worker IDs from scope
- pcf.loop -> serialized scf.forall with iteration bounds from count operands
- pcf.return -> scf.yield (in generic) or scf.forall.in_parallel (in loop)
- pcf.branch_cond_return -> cf.cond_br to a return block
If sync_on_return is set on a parallel op, a barrier (determined by the
scope) is inserted after the lowered op. This attribute is introduced when
converting sref to memref to retain the required synchronization semantics
of any tied results.
-iree-pcf-resolve-tokenslink
Resolves synchronization scopes on pcf.sref types.
Resolves synchronization scopes attached to pcf.sref types by expanding
them to their concrete representations. This pass should run before
iree-pcf-convert-sref-to-memref.
The input is IR containing pcf.sref types with sync scope attributes.
The pass expands pcf.sref<..., sync_scope> types into a pcf.sref
without sync scope plus any concrete types required by the sync scope
attribute. For shaped refs with sync_on_return scope, the parent
pcf.generic or pcf.loop op has its sync_on_return flag set to true,
ensuring a barrier is inserted when the op is lowered.
Write operations (pcf.write_slice) are updated to enqueue writes through
the sync scope's interface methods.