Skip to content

quantization

MRFI quantization methods

A Quantization has two static function quantize() and dequantize(), both have args x and other args have specified in config file. These two function should modify x inplace, DO NOT return something.

quantize() should make input x into a integer tensor with float32 type, aka. pseudo quantization, therefore pytorch can forward them correctly.

Warning

A integer bit flip error mode always need a quantization.

The bit_width argument and the result integer range (e.g. -128~127) should be consist with corresponding error mode argment. Since MRFI does not check value bound for performance reason, wrong arguments or wrong implemention of quantization may silently lead to unexpected experiment result.

Runtime dynamic quantization

If you set dynamic_range to "auto" in MRFI config, this value will be set to max range of the input tensor automatically by MRFI. This feature can be used to simulate runtime dynamic quantization. However, it should be noted that fault injection can also cause changes in the dynamic range of the later layer.

SymmericQuantization

Simple symmeric quantization.

Uniformly mapping a float tensor in range [-dynamic_range*scale_factor, +dynamic_range*scale_factor] into integer range [-2**(bit_width-1)+1, 2**(bit_width-1)-1].

Outliers are clipped.

Parameters:

Name Type Description Default
bit_width int Often 8 or 16. required
dynamic_range float Usually maximum of values. required
scale_factor float Extra factor on dynamic range. 1.0

PositiveQuantization

Simple positive quantization.

Uniformly mapping a float tensor in range [0, dynamic_range*scale_factor] into integer range [0, 2**(bit_width)-1].

Outliers are clipped.

Parameters:

Name Type Description Default
bit_width int Often 8 or 16. required
dynamic_range float Usually maximum of values. required
scale_factor float Extra factor on dynamic range. 1.0

FixPointQuantization

Fixpoint quantization.

Quantize a float tensor into binary fix point representation integer_bit.decimal_bit.

So the input dynamic range is [-2**integer_bit, 2**integer_bit], outliers are clipped.

Parameters:

Name Type Description Default
integer_bit int Integer bits of value. required
decimal_bit int Decimal bits of value. required