Skip to content

'pcf' Dialectlink

A dialect designed to model parallel control flow.

The pcf dialect models parallelized control flow using structured operations akin to dialects like scf. It offers a set of core loop-like constructs alongside the glue necessary to represent splitting and joining parallel work.

In contrast with scf whose scope is purely focused on representing common control flow, the pcf dialect includes type, interfaces, and operations that represent dataflow across parallel workers. This comprises two key conceptual types:

  1. Scoped memory. This is a reference to memory that carries information about its allocation scope as well as how to synchronize it. This allows for fencing at fine granularities (e.g. allocation).
  2. Tokens. Types capable of managing synchronization of resources between threads. This could be anything ranging from fences + (named) barriers to producer/consumer queues implemented with ringbuffers.

PCF ops + types are designed to be lowered in three phases, starting from structural ops on scoped memory infused with synchronization tokens. Prior to each phase a different level of scheduling is implied.

  1. Tokens tied to resources are split and lowered to separate ops. Before this the compiler can perform coarse grain scheduling around resources according to their tied synchronization.
  2. Generic scoped memory is converted to memref. Since all tokens have been resolved by this point, this is just a matter of propagating layout and memory space.
  3. Wrapping structured ops are lowered to control flow (scf and/or cf).

This diagram illustrates where the dialect fits in to executable lowering pipelines for typical GPUs. For CPUs and other accelerators, the same flow is intended to work modulo different levels of physical parallelism instead of thread/subgroup/lane.

        v----------+----------v
        | Executable Input    |
        | (Linalg on tensors) |
        +----------v----------+
                   |
 TileAndDistribute |
 to workgroups     |
                   |
        v----------+----------v
        |    PCF and/or SCF   |
        +----------v----------+
                   |
 SCF(.forall)ToPCF |
                   |
         v---------+---------v
+--------|-------+   +-------|--------+
|   Tile Op1 to  |   |   Tile OpN to  |
|   Subgroups/   |   |   Subgroups/   |
|   Threads/     |...|   Threads/     |
|   Lanes        |   |   Lanes        |
| pcf.concurrent |   | pcf.concurrent |
+--------|-------+   +-------|--------+
         +---------v---------+
                   |
         Vectorize | // WriteOps vectorize
                   |
         Bufferize | // PCF tensor -> pcf.sref
                   | // becomes memref -> sref
                   |
     ResolveTokens |
      SRefToMemRef | // pcf.sref -> memref
          LowerPCF | // pcf -> scf/cf
                   |
       v-----------+-----------v
       | SCF+GPU+vector+memref |
       +-----------------------+

Operationslink

Alloc opslink

pcf.alloc (PCF::AllocOp)link

Shaped ref allocation operation

Syntax:

operation ::= `pcf.alloc` `(`$dynamicSizes`)` attr-dict `:` type($result)

Allocates a pcf.sref with the given element type and shape. Dynamic dimensions in the result type must have corresponding dynamic size operands. The allocation scope is determined by the scope attribute of the result type.

Example:

  %sref = pcf.alloc() : !pcf.sref<4x8xf32, #foo.scope>
  %sref_dyn = pcf.alloc(%d0, %d1) : !pcf.sref<?x?xf32, #foo.scope>

Operands:link
Operand Description
dynamicSizes variadic of index
Results:link
Result Description
result A shaped reference to a buffer.

Parallel execution opslink

pcf.br.cond_return (PCF::BranchCondReturnOp)link

Branch operation with conditional return

Syntax:

operation ::= `pcf.br.cond_return` $condition $dest (`(` $dest_operands^ `:` type($dest_operands) `)`)? attr-dict

The pcf.br.cond_return operation represents a conditional branch operation to a given block, or return from the parent.

Example:

pcf.<scoped op> #foo.scope {
  ^bb0(%0: !foo.type)
    %1 = ... %0 : !foo.type
    pcf.br.cond_return %cond ^bb0(%0: !foo.type)
}

Traits: AlwaysSpeculatableImplTrait, HasParent<IREE::PCF::GenericOp>, Terminator

Interfaces: BranchOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands:link
Operand Description
condition 1-bit signless integer
dest_operands variadic of any type
Successors:link
Successor Description
dest any successor

Parallel execution opslink

pcf.generic (PCF::GenericOp)link

Execute a set of workers in parallel on a region.

Syntax:

operation ::= `pcf.generic` (`sync` $sync_on_return^)?
              `scope` `(` $scope `)`
              (`initialize` $initializer^)?
              custom<ParallelExecutionBody>($inits,
              type($inits),
              $dynamic_sizes,
              type($results),
              $is_tied,
              $region,
              $num_leading_args,
              "true")
              custom<InferNumIndexArgs>(ref($region), ref($num_leading_args), $num_index_args)
              prop-dict attr-dict

Executes a region across a set of workers at a specified scope. When control flow reaches this op, nproc workers of the specified scope are spawned and begin executing the region. The scope is given by an attribute implementing the ScopeAttrInterface interface and is responsible for the semantics of all pcf primitives at the same scope. Further details about scopes are included in the docs for the interface.

The optional initialize region is executed once when control flow first reaches the op. Values yielded from the initializer become block arguments available to the execute region. This is useful for setting up per-op state that persists across all worker invocations.

Results are produced by snapshotting the value of each result's tied sref once all workers have returned. Results can either be: 1. Tied to initial values (tensor or memref) - the init value provides the initial contents and the result captures the final state. 2. Allocated by the op itself - dynamic sizes must be provided for untied results with dynamic dimensions.

Basic example with tied results:

  %0 = ... : tensor<4x8xf32>
  %1 = pcf.generic scope(#foo.scope)
    execute(%ref = %0)[%id: index, %num_workers: index]
         : (!pcf.sref<4x8xf32, #foo.scope>) -> (tensor<4x8xf32>) {
    // Each worker can read/write %ref.
    pcf.return
  }

Example with initializer:

  %result = pcf.generic scope(#foo.scope)
    initialize {
      %scratch = pcf.alloc() : !pcf.sref<16xf32, #foo.scope>
      pcf.yield %scratch : !pcf.sref<16xf32, #foo.scope>
    } -> (%scratch_arg: !pcf.sref<16xf32, #foo.scope>)
    execute(%ref = %init)[%id: index, %num_workers: index]
         : (!pcf.sref<4x8xf32, #foo.scope>) -> (tensor<4x8xf32>) {
    // %scratch_arg is available here, initialized once.
    pcf.return
  }

Example with untied (allocated) results:

  %d0, %d1 = ... : index
  %result = pcf.generic scope(#foo.scope)
    execute[%id: index, %num_workers: index]
         : () -> (tensor<?x?xf32>{%d0, %d1}) {
    // Result sref is allocated by the op, not tied to any init.
    pcf.return
  }

Traits: AttrSizedOperandSegments, AutomaticAllocationScope, RecursiveMemoryEffects

Interfaces: OpAsmOpInterface

Attributes:link
AttributeMLIR TypeDescription
scope::mlir::iree_compiler::IREE::PCF::ScopeAttrInterfaceDefines parallel execution scope.
Operands:link
Operand Description
inits variadic of ranked tensor or memref of any type
dynamic_sizes variadic of index
Results:link
Result Description
results variadic of ranked tensor or memref of any type

pcf.loop (PCF::LoopOp)link

Execute a set of workers in parallel on a region.

Syntax:

operation ::= `pcf.loop` (`sync` $sync_on_return^)?
              `scope` `(` $scope `)`
              `count` `(` $count `)`
              custom<ParallelExecutionBody>($inits,
              type($inits),
              $dynamic_sizes,
              type($results),
              $is_tied,
              $region)
              prop-dict attr-dict

Executes a region for each point in the iteration space defined by the count operands. Unlike pcf.generic which spawns workers equal to the native parallelism of the scope, pcf.loop explicitly specifies the iteration count and maps iterations to workers according to the scope's scheduling policy.

When control flow reaches this op, the scope determines how to distribute the iterations across available workers. The scope is given by an attribute implementing the ScopeAttrInterface interface. Further details about scopes are included in the docs for the interface.

The execute region receives one index block argument per count operand, representing the current iteration's coordinates in the iteration space.

Results are produced by snapshotting the value of each result's tied sref once all iterations have completed. Results can either be: 1. Tied to initial values (tensor or memref) - the init value provides the initial contents and the result captures the final state. 2. Allocated by the op itself - dynamic sizes must be provided for untied results with dynamic dimensions.

Basic example with 1D iteration:

  %n = ... : index
  %0 = ... : tensor<4x8xf32>
  %1 = pcf.loop scope(#foo.scope) count(%n)
    execute(%ref = %0)[%id: index]
         : (!pcf.sref<4x8xf32, #foo.scope>) -> (tensor<4x8xf32>) {
    // %id ranges from 0 to %n-1.
    pcf.return
  }

Example with multi-dimensional iteration:

  %m, %n = ... : index
  %result = pcf.loop scope(#foo.scope) count(%m, %n)
    execute(%ref = %init)[%i: index, %j: index]
         : (!pcf.sref<?x?xf32, #foo.scope>) -> (tensor<?x?xf32>) {
    // %i ranges from 0 to %m-1, %j ranges from 0 to %n-1.
    pcf.return
  }

Traits: AttrSizedOperandSegments, AutomaticAllocationScope, RecursiveMemoryEffects, SingleBlockImplicitTerminator<mlir::iree_compiler::IREE::PCF::ReturnOp>, SingleBlock

Interfaces: OpAsmOpInterface, RegionBranchOpInterface

Attributes:link
AttributeMLIR TypeDescription
scope::mlir::iree_compiler::IREE::PCF::ScopeAttrInterfaceDefines parallel execution scope.
Operands:link
Operand Description
count variadic of index
inits variadic of ranked tensor or memref of any type
dynamic_sizes variadic of index
Results:link
Result Description
results variadic of ranked tensor or memref of any type

Read opslink

pcf.get_memref (PCF::GetMemrefOp)link

Extract a memref view from a slice of a pcf.sref.

Syntax:

operation ::= `pcf.get_memref` $source ``
              custom<DynamicIndexList>($offsets, $static_offsets)
              custom<DynamicIndexList>($sizes, $static_sizes)
              custom<DynamicIndexList>($strides, $static_strides)
              attr-dict `:` type($source) `to` type($result)

The pcf.get_memref operation extracts a memref view from a slice of a sref, breaking the synchronization guarantees of the source.

The returned memref must have a maximally dynamic layout (all strides and offset dynamic) and no memory space. Layout and memory space information is determined by the ConvertSRefToMemRef analysis pass.

The operation supports the following arguments: * source: the sref from which to extract a view. * offsets: shaped-rank number of offsets into the source from which the slice begins. * sizes: shaped-rank number of sizes which specify the sizes of the result memref type. * strides: shaped-rank number of strides that specify subsampling in each dimension.

Traits: AttrSizedOperandSegments

Interfaces: OffsetSizeAndStrideOpInterface

Attributes:link
AttributeMLIR TypeDescription
static_offsets::mlir::DenseI64ArrayAttri64 dense array attribute
static_sizes::mlir::DenseI64ArrayAttri64 dense array attribute
static_strides::mlir::DenseI64ArrayAttri64 dense array attribute
Operands:link
Operand Description
source A shaped reference to a buffer.
offsets variadic of index
sizes variadic of index
strides variadic of index
Results:link
Result Description
result memref of any type values

pcf.read_slice (PCF::ReadSliceOp)link

Read a tensor or vector from a pcf.sref based on the provided slice parameters.

Syntax:

operation ::= `pcf.read_slice` $source ``
              custom<DynamicIndexList>($offsets, $static_offsets)
              custom<DynamicIndexList>($sizes, $static_sizes)
              custom<DynamicIndexList>($strides, $static_strides)
              attr-dict `:` type($source) `to` type($result)

Read a slice from a pcf.sref. If this is reading a vector, the sizes may be smaller than the return vector type. In this case out of bounds elements have undefined value.

The pcf.read_slice operation supports the following arguments: * source: the shaped value that is written. * dest: the sref into which the source is written. * offsets: shaped-rank number of offsets into the dest into which the slice is inserted. * sizes: shaped-rank number of sizes which specify the sizes of the source tensor type. * strides: shaped-rank number of strides that specify subsampling in each dimension.

Traits: AttrSizedOperandSegments

Interfaces: OffsetSizeAndStrideOpInterface

Attributes:link
AttributeMLIR TypeDescription
static_offsets::mlir::DenseI64ArrayAttri64 dense array attribute
static_sizes::mlir::DenseI64ArrayAttri64 dense array attribute
static_strides::mlir::DenseI64ArrayAttri64 dense array attribute
Operands:link
Operand Description
source A shaped reference to a buffer.
offsets variadic of index
sizes variadic of index
strides variadic of index
Results:link
Result Description
result ranked tensor of any type values or vector of any type values

pcf.return (PCF::ReturnOp)link

Returns from a thread.

Syntax:

operation ::= `pcf.return` attr-dict

Returns control flow to the parent without fencing memory. If the parent carries an implicit fence one may still occur after the parent has finished.

Traits: AlwaysSpeculatableImplTrait, HasParent<IREE::PCF::GenericOp, IREE::PCF::LoopOp>, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Write opslink

pcf.write_slice (PCF::WriteSliceOp)link

Submit a write of a tensor, vector, or memref to a slice of a pcf.sref.

Syntax:

operation ::= `pcf.write_slice` $source `into` $dest ``
              custom<DynamicIndexList>($offsets, $static_offsets)
              custom<DynamicIndexList>($sizes, $static_sizes)
              custom<DynamicIndexList>($strides, $static_strides)
              attr-dict `:` type($source) `into` type($dest)

The pcf.write_slice operation supports the following arguments:

  • source: the shaped value that is written.
  • dest: the sref into which the source is written.
  • offsets: shaped-rank number of offsets into the dest into which the slice is inserted.
  • sizes: shaped-rank number of sizes which specify the sizes of the source tensor type.
  • strides: shaped-rank number of strides that specify subsampling in each dimension.

Traits: AttrSizedOperandSegments

Interfaces: OffsetSizeAndStrideOpInterface

Attributes:link
AttributeMLIR TypeDescription
static_offsets::mlir::DenseI64ArrayAttri64 dense array attribute
static_sizes::mlir::DenseI64ArrayAttri64 dense array attribute
static_strides::mlir::DenseI64ArrayAttri64 dense array attribute
Operands:link
Operand Description
source ranked tensor, vector, or memref of any type
dest A shaped reference to a buffer.
offsets variadic of index
sizes variadic of index
strides variadic of index

pcf.yield (PCF::YieldOp)link

Yields results from a region.

Syntax:

operation ::= `pcf.yield` attr-dict
              $operands `:` type($operands)

The values returned are copied by-value.

Traits: AlwaysSpeculatableImplTrait, HasParent<IREE::PCF::GenericOp>, ReturnLike, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface), RegionBranchTerminatorOpInterface

Effects: MemoryEffects::Effect{}

Operands:link
Operand Description
operands variadic of any type

Attributeslink

SequentialAttrlink

Attribute representing sequential execution

Syntax: #pcf.sequential

Scope that reuses the current process as the sole executor of a parallel region.

SyncOnReturnAttrlink

Synchronize when returning from the worker of the same scope

Syntax: #pcf.sync_on_return

Attribute indicating that the shaped ref this attribute is tied to is only fenced when the parent of the same scope returns. This is akin to memory order acquire on scope entry and __syncthreads followed by a memory order release fence on scope exit.

TestScopeAttrlink

Test scope attribute used for testing.

Syntax: #pcf.test_scope

Scope that fails on all interface uses. For use in testing where the scope is not relevant.

Typeslink

ShapedRefTypelink

A shaped reference to a buffer.

A reference to a buffer with unspecified layout and physical storage. Carries the shape and element type of the referenced region. Elements can be accessed by index, though no assumptions about the physical relation between two coordinates can be made. Elements at different coordinates must not internally alias. For example, if foo is a pcf.sref<2xi32>, foo[0] and foo[1] must be distinct values.

template<size_t rank, typename eltype, typename alloc_scope_ty, sync_scope>
class ShapedRef {
  // Access is pointwise within the coordinate space implied by the shape.
  // Element type determines the minimum access bitwidth.
  eltype *getElementPtr(int a, ...) // |rank| operands.
  size_t shape[rank];

  // Scope this referenced memory was allocated at. Defines memory space.
  alloc_scope_ty alloc_scope;
  // Class defining synchronization for this reference.
  token_ty sync_scope;
}

When the sync_scope is of type #pcf.sync_on_return, then a special printer kicks in, i.e. the following two types are equivalent:

!pcf.ref<?xi32, #pcf.test_scope, #pcf.sync_on_return>
!pcf.ref<?xi32, sync(#pcf.test_scope)>

Parameters:link
Parameter C++ type Description
shape ::llvm::ArrayRef<int64_t>
elementType Type
scope PCF::ScopeAttrInterface
sync_scope Attribute