MLIR算子量化Quantization

MLIR算子量化Quantization

本文概述了MLIR量化系统的设计。虽然术语“量化”是高度过载的，用于将浮点计算转换为以整数数学表示，适配的变量进行推理的技术的相当窄的范围，如低位深度推理引擎（如TFLite）所支持的，各种加速器硬件和许多DSP。

很大程度上受到了本文所采用的方法的启发，其中包含了许多扩展和修改。它具体记录了MLIR在这一主题上的立场，而不是一般性的参考。

Uniform quantization

General algorithm

Uniform quantization均匀量子化

MLIR支持的主要量化机制，通过实数线上的等间距点，来表示不动点和仿射变换。

此外，该方案可以应用于：

•每层per-layer：应用于目标类型中的每个值。

•每轴per-axis（也称为每通道）：沿张量类型的特定轴，分别应用于每个索引。

per-layer : Applying to every value within the target type.
per-axis (also called per-channel) : Applying individually to each index along a specific axis of a tensor type.

定点值

定点值是实数除以刻度。将实数除以的结果称为标度值。

The $$ real_value = scaled_value * scale $$

缩放可以解释为相邻缩放值之间的距离（以实单位表示）。例如，如果标度为$$pi$$，则具有此标度的定点值只能表示$$pi$$的倍数，而不能表示两者之间的值。将任意实数转换为给定值的固定点值的最大舍入误差$$ scale $$ is $$ frac{scale}{2} $$。

继续上一示例，当$$ scale = pi $$, 最大舍入误差为$$ frac{pi}{2} $$.

可以对具有不同比例的缩放值执行乘法，使用与实值乘法相同的算法（注意，乘积缩放值具有$$ scale_{product} = scale_{left mbox{ } operand} * scale_{right mbox{ } operand} $$).

可以对缩放值执行加法，只要具有相同的缩放比例，使用相同的实值加法算法。在计算机上有符号整数表示缩放值，并对这些有符号整数执行算子运算变得很方便，因为结果将是正确的缩放值。

Affine values

从数学上讲，仿射值是将实值零点加到标度值上的结果。或者（等价地），从仿射值中减去一个零点得到一个缩放值：

$$ real_value = scaled_value * scale = (affine_value - zero_point) * scale $$

从本质上说，仿射值是缩放值的某个常量的移动。算术（即加法、减法、乘法、除法）通常不能直接对仿射值执行；它们必须首先转换为等效的缩放值。

如上所述，使用仿射值的目的，更有效地表示在计算过程中实际遇到的实际值。将遇到的实数值不是围绕实数零对称的。假设在计算过程中遇到实零，应表示为实零。

存储由有符号整数表示的缩放值是低效的，因为某些有符号整数永远不会被使用。实际上，与这些有符号整数对应的位模式将被浪费。

为了用整数值仿射值精确地表示实零，零点必须是最小仿射值和最大仿射值（含）之间的整数。例如，给定一个由8位无符号整数表示的仿射值，我们有：$$0leq zerou pointleq 255$$。这一点很重要，因为在深度神经网络的卷积运算中，经常需要将输入和输出归零，所以零必须是可精确表示的，否则结果会有偏差。

Relation

实值、固定点值和仿射值通过以下等式进行关联，该等式演示了如何将一种类型的数字转换为另一种类型：

$$ real_value = scaled_value * scale = (affine_value - zero_point) * scale $$

计算机通常使用有限位数存储数学值。虽然上述转换是精确的，但要将结果存储在有限的位中，通常必须对转换结果进行舍入（这两种情况都适用：使用浮点存储和使用定点存储）。对舍入行为的全面讨论超出了本文的范围，除非另有说明，否则可以安全地假设舍入应符合RNE的IEEE754默认值（在硬件允许的情况下）。

Converting between real and fixed point or affine

To convert a real value to a fixed point value, we must know the scale. To convert a real value to an affine value, we must know the scale and the zero point.

Real to affine

要将实值元素的输入张量（通常由浮点格式表示，通常为单精度），转换为由整数类型（例如8位无符号整数）表示的仿射元素张量，可以执行以下转换（不需要使用整型的所有可表示值）：

$$ egin{align*} af&fine_value_{uint8 , or , uint16}
&= clampToTargetSize(roundToNearestInteger( frac{real_value_{Single}}{scale_{Single}})_{sint32} + zero_point_{uint8 , or , uint16}) end{align*} $$

In the above, we assume that $$real_value$$ is a Single, $$scale$$ is a Single, $$roundToNearestInteger$$ returns a signed 32-bit integer, and $$zero_point$$ is an unsigned 8-bit or 16-bit integer.

位深度和定点值的数目表示典型硬件上的常见类型，但不限于特定位深度或使用N位整数的整个范围的要求。

仿射到实数

要将uint8或uint16表示的仿射元素的输出张量，转换为实值元素的张量（通常用浮点格式表示，通常为单精度），可以执行以下转换：

$$ egin{align*} re&al_value_{Single}
&= roundToNearestFloat((affine_value_{uint8 , or , uint16} - zero_point_{uint8 , or , uint16})_{sint32})_{Single} * scale_{Single} end{align*} $$

在上面的例子中，假设减法的结果，32位有符号整数格式，并且$$roundToNearestFloat$$返回Single精度。

仿射到不动点

当仿射标度和不动点标度相同时，从仿射值中减去零点得到等价的不固定值。

$$ scaled_value = affine_value_{nonmbox{-}negative} - zero_point_{nonmbox{-}negative} $$

Fixed point to affine

当仿射尺度和不动点尺度相同时，将零点加到不动点的值上，得到等价的仿射值。

$$ affine_value_{nonmbox{-}negative} = scaled_value + zero_point_{nonmbox{-}negative} $$

Usage within MLIR

MLIR中正在开发的量化系统有几个内容：

Quantization dialect containing:

A family of QuantizedTypes which represent the mapping between expressed values (typically of a floating point computer type) and storage values (typically of an integral computer type).
Type conversion ops for converting between types based on a QuantizedType and its expressed and storage sub-types.
Instrumentation ops for assigning instrumentation points within the computation where runtime statistics may help guide the quantization process.

Integration with simulated quantization at training time
TFLite native quantization

The TFLite op-set natively supports uniform-quantized variants.
Passes and tools exist to convert directly from the TensorFlow dialect to the TFLite quantized operation set.

并不是所有的量子化应用都会用到所有这些设置。TensorFlow到TensorFlow Lite的转换，使用QuantizedTypes，但有自己的类型转换算子和支持数学的表达式。

Quantization Dialect

Quantized type

TODO: Flesh this section out.

QuantizedType base class
UniformQuantizedType

Quantized type conversion operations

qcast : Convert from an expressed type to QuantizedType
dcast : Convert from a QuantizedType to its expressed type
scast : Convert between a QuantizedType and its storage type

Instrumentation and constraint operations

const_fake_quant : Emulates the logic of the historic TensorFlow fake_quant_with_min_max_args operation.
stats_ref : Declares that statistics should be gathered at this point with a unique key and made available to future passes of the solver.
stats : Declares inline statistics (per layer and per axis) for the point in the computation. stats_ref ops are generally converted to statistical operations once trial runs have been performed.
coupled_ref : Declares points in the computation to be coupled from a type inference perspective based on a unique key.

Integration with simulated quantization at training time

训练时与模拟量化的集成

TensorFlow历来使用tf.quantization.fake_quant_*模拟训练时，量化效果的算子族。

正如最初实现的那样，TensorFlow Lite是推理时此类操作的主要对象。当启用量化推断时，如果每个合格的张量都经过一个适当的伪量化节点（张量可以应用伪量化的规则，多少有些牵扯），那么TensorFlow Lite将使用伪量化操作的属性，判断如何从量化算子转换为使用kernel子集。

在基于MLIR的量化中，伪量化算子将它们转换成一个序列来处理的，该序列是*qcast*（quantize），然后是*dcast*（dequantize），具有适当的*UniformQuantizedType*作为qcast算子的对象。

后续的编译器传递保留量化，以某种方式模拟的知识，同时允许编译器灵活地移动类型转换，简化了计算，并将其转换为基于积分算子的形式。

允许部分量化的计算，其中不能简化为积分运算的部分，仍然以浮点形式执行，并在边界处进行适当的转换。

TFLite native quantization

TODO: Flesh this out

General algorithm

Take input min/max information and set the ArrayInfo (which really is InputOrOutputArrayInfo.
In LegalizeTF, convert ArrayInfo min/max to tf.Quantize and tf.Dequantize nodes. (or tf.FakeQuant) Convert all constant FakeQuants to (tf.FQ -> tfl.Q -> tfl.DQ).
Hardcode logic/propagation needs to happen here.
Run TF constant folding.
In PrepareTFL, convert all tf.FQ to (tfl.Q -> tfl.DQ).
Run quantization pass that take (tfl.DQ (for both input and weights) -> op -> tfl.Q) and replaces with (op). Also replace (constant_float -> tfl.Q) with (constant_quant).

人工智能芯片与自动驾驶

相关阅读:
js复习---工厂函数---构造函数的执行过程
21年初的措不及防-------
element ui checkbox实现多项全选反选功能
vue 实现导航锚点联动
this.$router.currentRoute 和 this.$route的区别
重置vue组件的data数据 this.$options.data()
父组件中如何拿到引入的子组件里element-ui 的form表单ref对象
线程笔记
面向对象
关于上传和下载的笔记

原文地址：https://www.cnblogs.com/wujianming-110117/p/14306155.html