Skip to content
Sergey Bronnikov edited this page Feb 23, 2021 · 15 revisions

Fault injection is a technique for improving the coverage of a test by introducing faults to test code paths, in particular, error handling code paths. It is widely considered as an important part of developing robust software. There are many ways to do fault injection to assess the system.

Tool Level Target Comment
CharybdeFS Userspace (FUSE) Filesystem Requires Thrift
PetardFS Userspace (FUSE) Filesystem https://github.com/jrandall/petardfs
UnreliableFS Userspace (FUSE) Filesystem https://github.com/ligurio/unreliablefs
libeatmydata Userspace (LD_PRELOAD) Filesystem, fsync() replace fsync() with no-op, https://github.com/stewartsmith/libeatmydata
cleancache Userspace (LD_PRELOAD) Filesystem cache drop files content from page cache after use, https://github.com/kahing/bin/blob/master/cleancache.c
Device Mapper Kernel space Disk I/O Use Device Mapper's error/flakey/delay/dm-dust devices to return errors/corruption from, or delay/split IO to a synthesized block device (kernel, requires kernel to have been built with device mapper support, appropriate additional device mapper modules (dm-dust is only available on kernel >=5.2) and to have device mapper userspace bits). https://www.kernel.org/doc/Documentation/device-mapper/delay.txt
QEMU Hardware Disk, Memory blkdebug https://github.com/qemu/qemu/blob/master/docs/devel/blkdebug.txt
sysrq Kernel space OS crash echo c > /proc/sysrq-trigger, https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html
BSOD Kernel space OS crash Windows only, https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/forcing-a-system-crash-from-the-keyboard
strace Userspace POSIX API calls https://strace.io/
libfiu Userspace (LD_PRELOAD) POSIX API calls Use libfiu to perform fault injection on POSIX API calls, http://blitiri.com.ar/p/libfiu/
SystemTap Userspace POSIX API calls Using SystemTap to do fault injection (kernel, requires a kernel to have been built with lots of stuff), https://lwn.net/Articles/289932/
strobe time Userspace Time https://github.com/jepsen-io/jepsen/tree/main/jepsen/resources
libfaketime Userspace (LD_PRELOAD) Time https://github.com/wolfcw/libfaketime
timeskew Userspace (LD_PRELOAD) Time https://github.com/vi/timeskew
Linux kernel's fault injector Kernel space - Use the Linux kernel's fault injector to inject an error into the underlying block device (kernel, requires kernel to have been built with FAIL_MAKE_REQUEST=y).
trickle Userspace Network Bandwidth shaper for Unix-like systems, https://github.com/mariusae/trickle
tc (Linux), dummynet (FreeBSD) Kernel space Network https://man7.org/linux/man-pages/man8/tc.8.html, https://www.freebsd.org/cgi/man.cgi?dummynet
Linux kernel NVMe fault injection Kernel space NVMe https://www.kernel.org/doc/html/latest/fault-injection/nvme-fault-injection.html
Linux kernel notifier fault injection Kernel space Kernel events https://www.kernel.org/doc/html/latest/fault-injection/notifier-error-inject.html
Linux kernel fault injection capabilities infrastructure Kernel space Memory https://www.kernel.org/doc/html/latest/fault-injection/fault-injection.html

References:

  1. Restricting program memory https://alex.dzyoba.com/blog/restrict-memory/
  2. Chaos Engineering tools https://github.com/dastergon/awesome-chaos-engineering#notable-tools

Network Condition Profiles

Here's a list of network conditions with values that you can plug into Comcast. Please add any more that you may come across.

source

Name Latency Bandwidth Packet-loss
GPRS (good) 500 50 2
EDGE (good) 300 250 1.5
3G/HSDPA (good) 250 750 1.5
DIAL-UP (good) 185 40 2
DSL (poor) 70 2000 2
DSL (good) 40 8000 0.5
WIFI (good) 40 30000 0.2
Satellite 1500 - 0.2

Clone this wiki locally