RFC: Remove the FULLY_CONNECTED TOSA operator

EricKunze · December 7, 2021, 12:28am

We’ve been doing some review of the TOSA operators, and think that the overlap between the FULLY_CONNECTED, and the MATMUL operators is such that we should remove the FULLY_CONNECTED, and only have the MATMUL operator. We’re reaching out to the community to get feedback on whether this would be a benefit to TOSA implementations, or would cause significant disruption before we make the change to the specification.

The FULLY_CONNECTED operator is a simple fully connected operator, which takes an input of size [N,IC], and a weights tensor of size [OC,IC] to generate an output of size [N,OC]. It also takes a bias parameter of size [OC].

The MATMUL operator does batch matrix multiplication, multiplying the A set of matrices, size [N,H,C] by the B set of matrices, size [N,C,W] to get the output set of matrices, size [N,H,W].

Some important differences exist between the two operators that need to be handled to allow the replacement of FULLY_CONNECTED with MATMUL:

FULLY_CONNECTED has the weight matrix as an Attribute compared to an input for MATMUL. Since this makes things more flexible, presumably this isn’t a problem
The weight matrix of FC is transposed relative to the B matrix of MATMUL. To get the operation to work correctly, a TRANSPOSE of the weight matrix must be performed.
FC includes a bias addition in it’s operation, where MATMUL does not. This requires an ADD operation to be inserted. As a benefit, as ADD supports broadcast, the bias parameter can be broadcast if desired
MATMUL is a batch operation, so RESHAPES are needed on the input and output parameters to convert them from two dimensional tensors to three dimensional tensors by adding a dimension of size 1.

We think that the long term benefit of removing an overlapping operator in reduced maintenance and the ability to optimize a single operator rather than two is worth the short term cost. Most of the added operators could be optimized out depending on the legalization being used on the way out from TOSA.

After we get feedback on this plan, and assuming we decide to proceed, we’ll plan on working with the MLIR community to update the dialect appropriately. We’ll also work with any legalizations that exist to help transition them to using MATMUL where they currently use FC.

Thanks in advance for any feedback.

mgehre-amd · July 21, 2023, 10:06am

I agree. Removing FULLY_CONNECTED will make the instruction set more focussed.