Introduction
This document proposes adding a collection of utility operations to express statefulness in the TOSA specification. The goal of these operators is to express statefulness, scoping and mutability, while aligning with existing frontend and backend infrastructure supporting statefulness.
Use-cases: Stateful RNN/LSTM networks: tf.keras.layers.LSTM | TensorFlow v2.13.0
Memory Model
- A
tensor
is an abstract and immutable value comprising metadata describing itsshape
,element datatype
, and itsdata
. - The course of execution of the neural network consumes tensors, generating new tensors as the output of individual operations.
- A symbolic reference (or
symref
) of atensor
is a reference to the object holding a tensor. It interacts withtensor
values through read/write semantics. The read operation returns the latest tensor value that was most recently written by a write. - Each
tensor
is allocated discretely in non-aliasing memory and thus atensor
can have only a singlesymref
to it - The proposal assumes that two sets of memory regions are implemented for read/write content:
- A procedure-scoped stack-like memory
- A process-scoped heap-like memory
Execution Model
In order to fully express how variable operators work, we need to support the followings:
Operator Graphs
Since the statefulOps are truly global across multiple graph invocations, we need to declare a global data structure “variable_tensors” to store variable tensors. The “seen” flag will be explained in later sections.
tosa_execute_graph(tosa_graph_t graph, tosa_list_t input_list, tosa_list_t output_list, tosa_level_t level)
{
ERROR_IF(tensor_list_shape(input_list) != tosa_input_shape(graph));
ERROR_IF(tensor_list_shape(output_list) != tosa_output_shape(graph));
// Declare the global list for storing persistent variable tensors across multiple graphs
if (!variable_tensors) {
variable_tensors = list<tensor_t>();
} else { // Clear the "seen flag"
for (tensor_t var_tensor in variable_tensors) {
var_tensor.seen = false;
}
}
for_each(operator in graph order) {
ERROR_IF(operator input tensors do not meet requirement of operator Arguments inputs)
ERROR_IF(operator attributes do not meet requirement of operator Arguments attributes)
ERROR_IF(operator output tensors do not meet requirement of operator Arguments outputs)
ERROR_IF(operator data types do not meet requirement of operator Supported Data Types)
// Execute the operator as defined by the operation function pseduo-code
tosa_execute_operator(operator, level);
}
}
Variable Tensor Allocate
This utility function helps with the memory allocation for the variable operators.
variable_tensor_allocate
allocates the mutable persistent memory block for storing variable tensors.
The shape
argument contains the shape of the allocated memory block for the variable_tensor.
The uid
argument is a globally unique identifier for variable tensors.
tensor_t* variable_tensor_allocate<in_t>(dim_t shape, int32_t uid) {
size_t size = tensor_size(shape);
tensor_t *allocated_tensor = new tensor_t;
allocated_tensor->data = new in_t[size];
allocated_tensor->uid = uid;
allocated_tensor->is_written = false;
allocated_tensor->shape = shape;
allocated_tensor->type = in_t;
return allocated_tensor;
}
Variable Tensor Lookup
This a utility function to help with variable tensor lookups.
variable_tensor_lookup
checks whether a variable tensor has been allocated or not.
The uid argument is a globally unique identifier for variable tensors.
tensor_t variable_tensor_lookup(int32_t uid) {
// The global all_allocated_variable_tensors was instantiated at the first
// time of executing the tosa graph
for_each(tensor_t allocated_tensor in all_allocated_variable_tensors) {
if (allocated_tensor.uid == uid) {
return allocated_tensor;
}
}
return NULL;
}
Operators
Three new operators will be introduced in the spec: Variable, Variable_Read, Variable_Write
Variable
Defines a new TOSA variable. This is a persistent mutable value across multiple TOSA graph invocations. Modifications are expressed using read/write semantics.
Argument | Type | Name | Shape | Rank | Description |
---|---|---|---|---|---|
Attribute | int32_t | uid | - | - | Globally unique identifier for the declared variable tensor. |
Attribute | int32_t* | var_shape | var_shape | 0 to MAX_RANK | The variable tensor shape |
Attribute | var_t | type | - | - | Type of the tensor variable elements. |
Attribute | in_t* | initial_value | shape | 0 to MAX_RANK | Initial value of the variable tensor. This argument is optional. |
Enumeration Type:
The var_t
is an enumeration type which supports the following datatypes
- BOOLEAN
- INT8
- INT16
- INT32
- FP16
- BF16
- FP32
Example IR:
tosa.variable @sym_ref_name : <tensortype>
tosa.variable @myvar : tensor<1xf32>
Operation Function:
tensor_t var_tensor = variable_tensor_lookup(uid);
// Invocation for the first time
if (var_tensor == NULL) {
// Allocate the persistent mutable memory for the variable tensor
tensor_t var_tensor = variable_tensor_allocate<var_t>(var_shape, uid);
if (initial_value != NULL) {
REQUIRE(var_t == in_t);
REQUIRE(var_shape == shape);
for_each (index in shape) {
// Copy data from initial_value to var_tensor
in_t value = tensor_read<in_t>(initial_value, shape, index);
tensor_write<in_t>(var_tensor.data, var_shape, index, value);
}
var_tensor.is_written = true;
}
} else { // Variable tensor has already been declared
// It's invalid to declare the second variable with the same uid in a single graph execution,
REQUIRE(!var_tensor.seen);
return variable_tensor_lookup(uid);
}
var_tensor.seen = true;
The “seen” flag:
- In summary, the goal is to avoid declaring multiple variable tensors with the same name in a single graph execution
- However, we can still run the same graph for multiple times.
- This is a valid operation.
- So the “seen” flag is only useful within a single graph execution.
- Inside the single execution, the double declarations of a “seen” variable will fault.
- However, when we clear the “seen” variable between graph executions, reading the already initiated variable (same name) will be valid.
The “is_written” flag:
- The “is_written” flag is very straightforward. It’s not valid to read a variable tensor which hasn’t been allocated and written to.
Variable_Read
Reads the value from a pseudo-buffer resource holding a persistent mutable tensor.
Argument | Type | Name | Shape | Rank | Description |
---|---|---|---|---|---|
Attribute | int32_t | uid | - | - | Globally unique identifier of the variable tensor that is reading from |
Output | out_t* | output1 | shape | 0 to MAX_RANK | Output tensor |
Example IR:
%ssa_id = tosa.variable.read @sym_ref_name : <tensortype>
%1 = tosa.variable.read @ctr : tensor<13x21x3xi32>
Operation Function:
tensor_t variable_tensor = variable_tensor_lookup(uid);
// Check this variable tensor has been decalred
REQUIRE(variable_tensor != NULL);
// Check this variable tensor has been written
REQUIRE(variable_tensor.is_written);
// Output tensor's shape and variable_tensor's shape have to match
REQUIRE(variable_tensor.shape == shape);
// Output tensor's shape and variable_tensor's type have to match
REQUIRE(variable_tensor.type == out_t);
for_each (index in shape) {
// Read data from pseudo-buffer resource to the output
out_t value = tensor_read<tensor_t>(variable_tensor.data, variable_tensor.shape, index);
tensor_write<out_t>(input1, shape, index, value);
}
Variable_Write
Assigns a value to the pseudo-buffer resource holding a persistent mutable tensor.
Argument | Type | Name | Shape | Rank | Description |
---|---|---|---|---|---|
Attribute | int32_t | uid | - | - | Globally unique identifier of the variable tensor that is writing to |
Input | in_t* | input1 | shape | 0 to MAX_RANK | Input tensor |
Example IR:
tosa.variable.write @sym_var_name, %ssa_id : <tensortype>
tosa.variable.write @cell_state, %22, : tensor<24xf32>
Operation function:
tensor_t. variable_tensor = variable_tensor_lookup(uid);
// Check this variable tensor has been declared
REQUIRE(variable_tensor);
// The tensor has to be seen before to be written to
// The seen variable is cleared before each graph execution and set in declaration
REQUIRE(variable_tensor.seen);
// Input tensor's shape and variable_tensor's shape have to match
REQUIRE(variable_tensor.shape == shape);
// Input tensor's shape and variable_tensor's type have to match
REQUIRE(variable_tensor.type == in_t);
for_each (index in shape) {
// Write data from the input to the pseudo-buffer resource
in_t value = tensor_read<in_t>(input1, shape, index);
tensor_write<tensor_t>(variable_tensor.data, variable_tensor.shape, index, value);
}
variable_tensor.is_written = true;
Acknowledgment
- The original proposal was done by Suraj Sudhir: Statefulness Support for TOSA