Skip to content

CodegenDialectPCF

-iree-pcf-convert-forall-to-loopslink

Converts scf.forall ops to pcf.loop

Test pass for converting scf.forall ops without mapping attributes to pcf.loop ops with sequential scope.

The input is IR containing scf.forall ops with tensor results and tensor.parallel_insert_slice terminators. Only forall ops without mapping attributes are converted.

The output replaces each matching scf.forall with a pcf.loop: - Iteration bounds come from the forall's upper/lower bounds and steps - tensor.parallel_insert_slice ops become pcf.write_slice ops - Shared output tensors become tied pcf.sref region arguments - The scope is set to #pcf.sequential for sequential execution

The underlying conversion pattern is exposed separately via convertForallToPCF() with callbacks for mapping processor IDs to custom execution scopes.

-iree-pcf-fuse-consumerslink

Fuses all consumers of pcf.generic/loop ops

Test pass for fusing consumer operations into pcf.generic and pcf.loop ops.

The input is IR containing PCF parallel ops with external consumers that implement TilingInterface or are tensor.extract_slice ops.

The pass greedily fuses each tilable consumer into the producer PCF op by cloning the consumer into the PCF region, tiling it to match the iteration space, and replacing the external use with a new pcf.write_slice.

Supported fusion scenarios: - Multiple pcf.write_slice producers for a single consumer value - Fusion along multiple operands with a single pcf.write_slice per operand - tensor.extract_slice by adding a condition based on the slice bounds.

The underlying fusion patterns are exposed via matchTilableConsumer() and fuseTilableConsumer() for use in custom pipelines.

-iree-pcf-fuse-pcf-writeslink

Consolidates pcf.write_slice ops in loop bodies

Test pass for composing pcf.write_slice operations with nested scf.forall producers.

The input is IR containing pcf.write_slice ops where the source value is produced by an scf.forall with tensor.parallel_insert_slice in its terminator.

The pass moves each matching pcf.write_slice inside the scf.forall body, composing the slice parameters: - New offsets: write_offset + insert_offset * write_stride - New sizes: from the parallel_insert_slice - New strides: write_stride * insert_stride

This enables further lowering by ensuring writes happen at the granularity of the inner parallel loop rather than after the entire forall completes.

The underlying pattern is exposed via composeWriteSliceWithParallelInsert() for use in custom pipelines.

-iree-pcf-lower-structural-pcflink

Lowers pcf.generic/loop to scf.execute_region/forall

Lowers structured PCF parallel ops to SCF and CF ops. This is the final step of PCF lowering, converting the abstract parallel constructs to concrete control flow that can be further lowered to target-specific code.

The input is expected to be IR where PCF ops no longer have tied results (after iree-pcf-convert-sref-to-memref). The scope attribute on each op determines how worker IDs and counts are materialized.

Op conversions: - pcf.generic -> scf.execute_region with worker IDs from scope - pcf.loop -> serialized scf.forall with iteration bounds from count operands - pcf.return -> scf.yield (in generic) or scf.forall.in_parallel (in loop) - pcf.branch_cond_return -> cf.cond_br to a return block

If sync_on_return is set on a parallel op, a barrier (determined by the scope) is inserted after the lowered op. This attribute is introduced when converting sref to memref to retain the required synchronization semantics of any tied results.