we recently encountered that it would be nice to pad with unspecified values. This would enable to optimize slice/pad operations which affect the same dimensions, as a slice followed by a unspecified padding could be folded away nicely.
We haven’t seen a way how an unspecified pad could currently be represented in TOSA. What do you think about the idea of introducing it?
One way to introduce it in the TOSA Spec would be to make the pad_const being an optional attribute, stating it is unspecified if not set.
That’s an interesting idea. I’m concerned that adding this would create a large amount of undefined behavior in a TOSA system. We would not be able to test a graph that contained a pad of ‘unspecified’, and if that value is then used in another operation like ADD, what would be the expected result?
In the case you describe, a pad of ‘unspecified’ would not be guaranteed to give the original values back after the slice/pad sequence. Once you have created the slice, the values should be considered unrecoverable. To do anything else could be very hard for some implementations.
What type of operation would you be doing where you padded a tensor, but didn’t care about what that value is? Perhaps there is another solution we could find.
For some operations, such as a matrix multiplication, one can add padding to the outer dimensions and slice the padded area off again after the operation without influencing the results. This can be useful to fulfill alignment requirements. Now, if you have several of those multiplications one after another, always generating the same slice/pad sequence in between the matmuls, you do not want to actually perform those slices and then pad again.
The unspecified pad value as a solution seems neat as it allows for this very local optimization to get rid of unnecessary slicing and padding.
Every pad with unspecified values can decide which (fixed but unspecified) values it picks. That means if they happen to be the same as before the slice, that is fine. Different pad operations can decide for different unspecified values, as they could with a given value.
I think other operators such as add should be able to handle the unspecified pad values the same way they do handle other unknown/non-compile-time-constant values.
When it comes to how these unspecified values are implemented, a backend can decide that they always choose zero for these values.
I think I don’t fully understand what you mean here. I think they need not be recoverable, but when padding with another unspecified value and then optimizing, they might be the same again (but nobody needs to put effort there, because the slice doesn’t exist and needs no special handling).
Thanks for the example, that makes it easier to understand your use case.
This is that part that concerns me. We’ve spent effort in TOSA to eliminate unspecified behavior, asking for bit accuracy for integer operations and bounded errors for floating-point operations. This would be a step in the opposite direction, intentionally adding unspecified behavior.
For those operators, once the inputs are known, then the results are also known. After your tosa.matmul, one column of the results might be completely different between implementations and could not be verified for accuracy.
Sorry, this was me misunderstanding your original post. I thought that you expected the existing values to be restored after slice + pad. From your example, I know that isn’t the requested behavior.
I’m trying to think of an option that would let you recognize this pattern without adding unspecified behavior across TOSA. I haven’t thought of anything yet, but I didn’t want to lose this thread.
thanks for the comments, I understand that concern. I agree that it makes testing more complex.
We have an alternative design we can use to resolve this issue, which does not require to use the unspecified padding, we will go with that.
Thank you for the feedback and explaining the design philosophy!