Tosa::Maxpool2D or AveragePool2D spec check when lowering from onnx or torch

ehsan-toosi · May 15, 2023, 7:41am

Hi everyone,

I am implementing a verifier for the tosa pool operations (tosa.max_pool2d or tosa.avg_pool2d). I was implementing this check ERROR_IF(OW != idiv_check(IW + pad_left + pad_right - kernel_x, stride_x) + 1); from the spec in Maxpool Spec, and realized idiv_check triggers an error for some pool operations that have been legalized from Onnx or Torch. Then, I realized the current implementation of idiv_check demands IW + pad_left + pad_right - kernel_x to be a multiple of stride_x. The question is how tosa handles the situations in which the upper dialect has a maxpool operation that the summation of input size, kernel, pad sizes are not divisible by the stride. The onnx and torch dialects use floor or ceil operation for these cases.

int32_t idiv_check(int32_t input1, int32_t input2) {
    ERROR_IF(input1 % input2 != 0); // input1 must be a multiple of input2
    return input1 / input2;         // exact quotient without rounding
}

Onnx:

%1 = "onnx.MaxPoolSingleOut"(%0) {ceil_mode = 0 : si64, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]} : (tensor<1x64x112x112xf32>) -> tensor<1x64x56x56xf32>

Tosa:

%1= "tosa.max_pool2d"(%0) {kernel = [3, 3], pad = [1, 1, 1, 1], stride = [2, 2]} : (tensor<1x112x112x64xf32>) -> tensor<1x56x56x64xf32>

Then:

idiv_check(112 + 1 + 1 - 3, 2) = idiv_check(111, 2) ---------> Error

Should the idiv_check be a floor/ceil operation instead?

Thanks,
Ehsan

EricKunze · May 15, 2023, 5:57pm

This is a good question. You can make this happen with TensorFlow also.

We think that there is value in being exact in how the shapes of the input and output are related, which requires the ERROR_IF to be evenly divisible. Most of the time, this can be done by adjusting the padding values to reflect the padding used by the stride/kernel/size combination.

It is possible to create a case where a pooling kernel does not use all of the elements of the input tensor. In that case, a SLICE operation could be used to create an input tensor with only the values that are used. A backend could recognize this and operate on the combined SLICE/*POOL without doing an explicit data copy operation if it could be handled. I expect this case is rare, as you are spending effort creating data elements and then not using them, but we’ve observed it in a network.

In the example you give, the padding appears to be incorrect, you will never use the right/bottom pad. (Assuming the real input tensor is locations (0,111) pad top is input location -1, and pad bottom is input location 112. Also using 1-D for simplicity)
Location 112 is never accessed, so pad_bottom should be 0.

When calculating output location 55 following the pseudocode:

iy = oy * stride_y - pad_top
109 = 55 * 2 - 1

for_each( 0 <= ky < kernel_y)
 y = iy + ky

(109, 110, 111) for kernel_y == 3

ehsan-toosi · May 24, 2023, 2:34pm

Thanks Eric for the explanation and also mentioning the slice case. We are going to adapt this in the legalization to tosa.