RFC: Support StatefulOps in TOSA

Jerry-Ge · July 25, 2023, 8:37pm

Introduction

This document proposes adding a collection of utility operations to express statefulness in the TOSA specification. The goal of these operators is to express statefulness, scoping and mutability, while aligning with existing frontend and backend infrastructure supporting statefulness.

Use-cases: Stateful RNN/LSTM networks: tf.keras.layers.LSTM | TensorFlow v2.13.0

Memory Model

A tensor is an abstract and immutable value comprising metadata describing its shape, element datatype, and its data.
The course of execution of the neural network consumes tensors, generating new tensors as the output of individual operations.
A symbolic reference (or symref) of a tensor is a reference to the object holding a tensor. It interacts with tensor values through read/write semantics. The read operation returns the latest tensor value that was most recently written by a write.
Each tensor is allocated discretely in non-aliasing memory and thus a tensor can have only a single symref to it
The proposal assumes that two sets of memory regions are implemented for read/write content:
- A procedure-scoped stack-like memory
- A process-scoped heap-like memory

Execution Model

In order to fully express how variable operators work, we need to support the followings:

Operator Graphs

Since the statefulOps are truly global across multiple graph invocations, we need to declare a global data structure “variable_tensors” to store variable tensors. The “seen” flag will be explained in later sections.

tosa_execute_graph(tosa_graph_t graph, tosa_list_t input_list, tosa_list_t output_list, tosa_level_t level)
{
    ERROR_IF(tensor_list_shape(input_list) != tosa_input_shape(graph));
    ERROR_IF(tensor_list_shape(output_list) != tosa_output_shape(graph));

    // Declare the global list for storing persistent variable tensors across multiple graphs
    if (!variable_tensors) {
        variable_tensors = list<tensor_t>();
    } else { // Clear the "seen flag"
        for (tensor_t var_tensor in variable_tensors) {
            var_tensor.seen = false;
        }
    }

    for_each(operator in graph order) {
        ERROR_IF(operator input tensors do not meet requirement of operator Arguments inputs)
        ERROR_IF(operator attributes do not meet requirement of operator Arguments attributes)
        ERROR_IF(operator output tensors do not meet requirement of operator Arguments outputs)
        ERROR_IF(operator data types do not meet requirement of operator Supported Data Types)
        // Execute the operator as defined by the operation function pseduo-code
        tosa_execute_operator(operator, level);
    }
}

Variable Tensor Allocate

This utility function helps with the memory allocation for the variable operators.

variable_tensor_allocate allocates the mutable persistent memory block for storing variable tensors.

The shape argument contains the shape of the allocated memory block for the variable_tensor.

The uid argument is a globally unique identifier for variable tensors.

tensor_t* variable_tensor_allocate<in_t>(dim_t shape, int32_t uid) {
    size_t size = tensor_size(shape);
    tensor_t *allocated_tensor = new tensor_t;
    allocated_tensor->data = new in_t[size];
    allocated_tensor->uid = uid;
    allocated_tensor->is_written = false;
    allocated_tensor->shape = shape;
    allocated_tensor->type = in_t;
    return allocated_tensor;
}

Variable Tensor Lookup

This a utility function to help with variable tensor lookups.

variable_tensor_lookup checks whether a variable tensor has been allocated or not.

The uid argument is a globally unique identifier for variable tensors.

tensor_t variable_tensor_lookup(int32_t uid) {
    // The global all_allocated_variable_tensors was instantiated at the first
    // time of executing the tosa graph
    for_each(tensor_t allocated_tensor in all_allocated_variable_tensors) {
        if (allocated_tensor.uid == uid) {
            return allocated_tensor;
        }
    }
    return NULL;
}

Operators

Three new operators will be introduced in the spec: Variable, Variable_Read, Variable_Write

Variable

Defines a new TOSA variable. This is a persistent mutable value across multiple TOSA graph invocations. Modifications are expressed using read/write semantics.

Argument	Type	Name	Shape	Rank	Description
Attribute	int32_t	uid	-	-	Globally unique identifier for the declared variable tensor.
Attribute	int32_t*	var_shape	var_shape	0 to MAX_RANK	The variable tensor shape
Attribute	var_t	type	-	-	Type of the tensor variable elements.
Attribute	in_t*	initial_value	shape	0 to MAX_RANK	Initial value of the variable tensor. This argument is optional.

Enumeration Type:

The var_t is an enumeration type which supports the following datatypes

BOOLEAN
INT8
INT16
INT32
FP16
BF16
FP32

Example IR:

tosa.variable @sym_ref_name : <tensortype>
tosa.variable @myvar : tensor<1xf32>

Operation Function:

tensor_t var_tensor = variable_tensor_lookup(uid);

// Invocation for the first time
if (var_tensor == NULL) {
  // Allocate the persistent mutable memory for the variable tensor
  tensor_t var_tensor = variable_tensor_allocate<var_t>(var_shape, uid);

  if (initial_value != NULL) {
    REQUIRE(var_t == in_t);
    REQUIRE(var_shape == shape);
    for_each (index in shape) {
      // Copy data from initial_value to var_tensor
      in_t value = tensor_read<in_t>(initial_value, shape, index);
      tensor_write<in_t>(var_tensor.data, var_shape, index, value);
    }
    var_tensor.is_written = true;
  }
} else { // Variable tensor has already been declared
  // It's invalid to declare the second variable with the same uid in a single graph execution,
  REQUIRE(!var_tensor.seen);
  return variable_tensor_lookup(uid);
}

var_tensor.seen = true;

The “seen” flag:

In summary, the goal is to avoid declaring multiple variable tensors with the same name in a single graph execution
However, we can still run the same graph for multiple times.
This is a valid operation.
So the “seen” flag is only useful within a single graph execution.
Inside the single execution, the double declarations of a “seen” variable will fault.
However, when we clear the “seen” variable between graph executions, reading the already initiated variable (same name) will be valid.

The “is_written” flag:

The “is_written” flag is very straightforward. It’s not valid to read a variable tensor which hasn’t been allocated and written to.

Variable_Read

Reads the value from a pseudo-buffer resource holding a persistent mutable tensor.

Argument	Type	Name	Shape	Rank	Description
Attribute	int32_t	uid	-	-	Globally unique identifier of the variable tensor that is reading from
Output	out_t*	output1	shape	0 to MAX_RANK	Output tensor

Example IR:

%ssa_id = tosa.variable.read @sym_ref_name : <tensortype>
%1 = tosa.variable.read @ctr : tensor<13x21x3xi32>

Operation Function:

tensor_t variable_tensor = variable_tensor_lookup(uid);
// Check this variable tensor has been decalred
REQUIRE(variable_tensor != NULL);
// Check this variable tensor has been written
REQUIRE(variable_tensor.is_written);
// Output tensor's shape and variable_tensor's shape have to match
REQUIRE(variable_tensor.shape == shape);
//  Output tensor's shape and variable_tensor's type have to match
REQUIRE(variable_tensor.type == out_t);

for_each (index in shape) {
    // Read data from pseudo-buffer resource to the output
    out_t value = tensor_read<tensor_t>(variable_tensor.data, variable_tensor.shape, index);
    tensor_write<out_t>(input1, shape, index, value);
}

Variable_Write

Assigns a value to the pseudo-buffer resource holding a persistent mutable tensor.

Argument	Type	Name	Shape	Rank	Description
Attribute	int32_t	uid	-	-	Globally unique identifier of the variable tensor that is writing to
Input	in_t*	input1	shape	0 to MAX_RANK	Input tensor

Example IR:

tosa.variable.write @sym_var_name, %ssa_id : <tensortype>
tosa.variable.write @cell_state, %22, : tensor<24xf32>

Operation function:

tensor_t. variable_tensor = variable_tensor_lookup(uid);
// Check this variable tensor has been declared
REQUIRE(variable_tensor);
// The tensor has to be seen before to be written to
// The seen variable is cleared before each graph execution and set in declaration
REQUIRE(variable_tensor.seen);
//  Input tensor's shape and variable_tensor's shape have to match
REQUIRE(variable_tensor.shape == shape);
//  Input tensor's shape and variable_tensor's type have to match
REQUIRE(variable_tensor.type == in_t);

for_each (index in shape) {
    // Write data from the input to the pseudo-buffer resource
    in_t value = tensor_read<in_t>(input1, shape, index);
    tensor_write<tensor_t>(variable_tensor.data, variable_tensor.shape, index, value);
}

variable_tensor.is_written = true;

Acknowledgment

The original proposal was done by Suraj Sudhir: Statefulness Support for TOSA