TOSA op with "round half to even" (lowering pytorch's quantize_per_tensor)

mgehre-amd · December 19, 2022, 2:35pm

Hi!

I’m trying to implement a lowering of pytorch’s quantize_per_tensor [0] to TOSA.
quantize_per_tensor is specified [1] using round [2], which is specified to “round half to even”.
I seem to be unable to find a TOSA operator with “round half to even” behavior" (CAST seems to be “round toward zero”), so I cannot get an accurate lowering.
Any ideas?

Best wishes,
Matthias

[0] Sorry, can only put two links as a new user
[1] Quantization — PyTorch master documentation (glaringlee.github.io)
[2] torch.round — PyTorch 1.13 documentation

EricKunze · December 19, 2022, 10:56pm

Hi Matthias,

Often we’re dealing with fully quantized networks, and use the RESCALE operator to convert between the different sizes, and does include rounding. So if you’re working towards a fully quantized network, you would then only need to adjust your scale and shift values based on your pytorch scale. (zero point doesn’t change)

If you’re trying to replicate the floating point functionality, I believe you would need to add the rounding as a combination of ADD and then CAST, using the CAST round towards zero.

Thanks,
Eric

mgehre-amd · December 20, 2022, 8:00am

Hi Eric,

thanks for your fast answer.
May goal is to lower a mixed network that has mostly quantized but also a few floating point operators.

I don’t quite understand how I would emulate “round half to even” via ADD and CAST.
I think it would be implemented roughly like

function ROUND_HALF_TO_EVEN(X, type)
  C = FLOOR(X);
  FRAC = X - C;
  TIE = EQUAL(FRAC, 0.5)
  ODD = IS_ODD(CAST(C))
  return CAST(SELECT(TIE && ODD, C + 1, C))

with some extra logic to implement IS_ODD. Is there an easier way?

Would it be an option to extend the TOSA spec to (a) have an attribute on CAST that specifies the rounding mode or (b) an extra operator with “round half to even” behavior?

Best wishes,
Matthias

dominicsymes · December 20, 2022, 5:04pm

Hi Matthias,

Thanks for your posting. I’d like to follow up on a couple of points from this thread.

The TOSA specification defines CAST as round to the nearest integer but does not specify the behavior for ties of exact half-way point. In the TOSA reference model, 2.5 rounds to 3 using std::round() and this is I think round to nearest with half away from zero. Just wanted to check where you saw the behavior of round towards zero?

For networks consisting of all integer operations, TOSA can define the result bit-exact as all the integer TOSA operations define the result bit-exact. TOSA does not define bit-exactness for networks containing floating-point operations as floating-point operation results can vary according to operation order and rounding behavior (eg handling of ties) and these are aspects that are left open to allow a range of implementations. As you say, there is nothing in TOSA to specify a round-to-nearest-even mode. If CAST were to specify a particular rounding mode (such as round to nearest even) I think that would still not allow for exact comparison in general due to, for example, operation order of other floating-point operations before the CAST.

Best regards,
Dominic

mgehre-amd · December 21, 2022, 7:26am

Hi Dominic,

you are right, I’m also seeing “round to nearest with half away from zero” in the TOSA lowering implementation in LLVM.

I also understand that I cannot expect bit-accurate floating point operations (in any setting really),
but, usually, one can still get close results in terms of relative accuracy.
The effect of the rounding mode is much bigger. Whether rounding 3.5 to 3 or to 4 gives an error of ~ 30%.

So, allowing to specify the rounding mode in CAST wouldn’t make floating point bit accurate (never will),
but will allow to obtain a reasonable relative accuracy.

Best wishes,
Matthias

dominicsymes · December 21, 2022, 6:12pm

Hi Matthias,

Thanks for confirming that the rounding you see with the CAST operator matches the reference.

In terms of relative error, I think a small relative error before a CAST can become a large relative error after even if rounding of tie is specified – for example, 3.49 and 3.51 will round to 3 or 4 regardless of tie mode. However, as you say, the maximum relative error of float to integer in isolation can be large if tie rounding is not specified.

We’re just entering the holiday season with several people out and so this thread may go quiet for a bit until into the new year.

Best regards,
Dominic

mgehre-amd · January 18, 2023, 1:26pm

Happy new year

I’m thinking about the following alternatives:

Adding an optional attribute “roundingMode” to the CAST op, which can take values “unspecified, round half away from zero, round half to even” with “unspecified” being the default.
Adding an round op with attribute “roundingMode”, which can take values “unspecified, round half away from zero, round half to even” with “unspecified” being the default. This could also subsume the floor and ceil ops by extending roundingMode.

Would you consider this for inclusion into the standard?

I checked tensorflow, and it also seems to define its default rounding mode to “round half to even” [0],
so that proposal would also help there.

Thank you!

[0] tf.math.round | TensorFlow v2.11.0

mgehre-amd · January 24, 2023, 7:36am

Also ONNX defines its round function to round “half to even”: round in onnx/Operators.md · onnx/onnx (github.com)

dominicsymes · January 24, 2023, 5:08pm

Hi Matthias,

Happy new year.

We prefer to avoid optional attributes since options provide additional complexity for an implementation.

We agree round to nearest even is quite a commonly used rounding type. We are looking at whether it would be possible to define the rounding mode for the CAST operator to be round-to-nearest with tie to even (rather than the current round to nearest but tie unspecified). There are a few things we need to check to see if this would cause any issues with current usage.

Best regards,
Dominic

mgehre-amd · January 24, 2023, 5:42pm

Hi Dominic,

this sounds great! Let me know if I can help you in that process.

Cheers,
Matthias

mgehre-amd · March 20, 2023, 12:15pm

Hi,

are there any updates on this?
Is there something I could do to help the process or at least to get visibility?

Thanks,
Matthias

dominicsymes · March 20, 2023, 12:57pm

Hi Matthias,

The TOSA specification and reference model have been updated now and these should be in the 0.60 release.

Please see this commit for the reference model change:
https://git.mlplatform.org/tosa/reference_model.git/commit/?id=57bc0796cd85115684219cf373db04c172848306

Best regards,
Dominic

mgehre-amd · March 20, 2023, 1:56pm

Hi Dominic,

I had looked at the TOSA 0.6 spec,
but I couldn’t find a mention of the half-to-even rounding behavior in the section about CAST or in the section 3.3.2. Numeric Conversion Helpers, which still says

int round_to_nearest_int(float_t f)
  Converts the floating-point value to f, with rounding to the nearest integer value.
  For the required precision see the section: Main inference precision requirements.

But now that you mentioned it, I found the updated text Otherwise for fp32_t the result must be rounded to the nearest representable value using the round to nearest, ties to even rounding mode. in the section “Main Inference precision requirements”.

Thanks,
Matthias

dominicsymes · March 20, 2023, 2:31pm

Hi Matthias,

Yes, that’s the correct text. Section 1.8.2 gives the main inference profile compliance that defines floating point behavior. CAST is listed in row 5 of that table along with other operations such as add and subtract that require round to nearest, ties to even for fp32_t. More generally version 0.60 gives more detail on defining floating-point accuracy for TOSA operations in that table. This is collected in section 1.8.2 rather than being in the description for each operation.

Best regards,
Dominic