Utility of BF16 in Main Inference profile?

The latest revision of the spec (v0.30) makes the floating point format explicit for each operand. Included in this list of formats is BF16, which does seem to make sense for training. However, for many ops, bf16_t is also listed as a valid format for the Main Inference profile. I am wondering if there really are use-cases for BF16 in inference as I had thought it was mostly a training format? I at least know that BF16 is a poor format for linear/uncompanded HDR pixel data, as the range doesn’t need such a vast exponent and the reduced significand (mantissa) precision can sometimes result in banding artefacts. Does anyone out there know of BF16 use-cases for inference?

There is a decent amount of use of bfloat for inference. Recent CPUs have support for bf16 ops, and can give a notable speedup depending on the implementation.

It might not work for pixel data, but BERT type models can work with BF16 with minimal precision problems, and I expect that many recommendation networks would also retain their accuracy using bf16. So there are reasonably good reasons to keep it in the Main Inference profile. We’ll keep monitoring what’s possible as we update the spec to make sure bf16 remains a useful performant option for inference.