diff --git a/ipu/docs/performance.md b/ipu/docs/performance.md
index 67b9e0dda..1fcd10e3a 100644
--- a/ipu/docs/performance.md
+++ b/ipu/docs/performance.md
@@ -42,3 +42,12 @@ def update(i, opt_state, batch):
 will result into an IPU jitted function where only the `batch` is transfered at every call from host to device, and the `opt_state` remains on the IPU SRAM (after being transfered at the first call). The training loop does not require any additional modification.
 
 Please refer to the [MNIST example](../examples/mnist_classifier.py) for a full example of buffer donation on the IPU.
+
+
+## Write a custom op
+
+One of the joys of IPU programming is that programming at the tile level is often conceptually easier than on GPU systems because the IPU tile processor behaves like a conventional processor, programmed in C++.  See an example at
+[custom_primitive_test.py](../tests/ipu/primitive/custom_primitive_test.py).
+
+For further examples, see [demo_vertex.cpp](https://github.com/graphcore-research/tessellate-ipu/blob/main/examples/demo/demo_vertex.cpp) in the [TessellateIPU library](https://github.com/graphcore-research/tessellate-ipu).
+