Skip to content

MRFI

Overview

Multi-Resolution Fault Injector is a powerful neural network fault injector based on PyTorch.

Compared with other injection frameworks, the biggest feature is that it can flexibly adjust different injection configurations for different experimental needs. Injection config and observations on each layer can be set independently by one clear config file. MRFI also provides a large number of commonly used error injection methods and error models, and allows customization.

Overview Pic

In preliminary experiments, you may not want to face complex experimental configurations. For example, simply observing the parameters of the network model, or conducting error injection experiments with a simple global configuration. MRFI also provide simple API for observation and course-grained fault injection.

See MRFI Basic usage to learn how to use MRFI.

On our paper of MRFI on Arxiv, we provided a detailed explanation of the background of the problem, the composition and principles of MRFI, and demonstrated the importance of fine-grained evaluation through experiments using MRFI.

Supported Features

Activation injection

  • Fixed position (Permanent fault)
  • Runtime random position (Transient fault)

Weight injection

  • Fixed position (Permanent fault)
  • Runtime random position (Transient fault)

Injection on quantization model

  • Posting training quantization
  • Dynamic quantization
  • Fine-grained quantization parameters config
  • Add custom quantization

Error mode

  • Integer bit flip
  • Float bit flip
  • Stuck-at fault (SetValue)
  • Random value
  • Add random noise

Internal observation & visualize

  • Activation & Weight observer
  • Error propagation observer
  • Easy to save and visualize result, work well with numpy and matplotlib

Flexibility

  • Add custom error_mode, selector, quantization and observer
  • Distinguish network-level, layer-level, channel-level, neuron-level and bit-level fault tolerance difference

Performance

  • Automatically use GPU for network inference and fault injection
  • The selector - injector design is significantly faster than generate probability on all position when perform a random error injection
  • Accelerate error impact analysis through internal observer metrics rather than use original accuracy metric

Fine-grained configuration

  • By python code
  • By .yaml config file
  • By GUI

Evaluation fault tolerance policy

  • Selective protection on different level
  • More fault tolerance method may be support later (e.g. fault tolerant retrain, range-based filter)