OpenUnlearn is released alongside the work by Pawelczyk et al (2024b) as a first general-purpose lightweight library that provides a suite of methods to systematically evaluate the quality of unlearning methods.
- Diverse areas of Unlearning research: OpenUnnlearn includes ready-to-use API interfaces for 8 state-of-the-art unlearning methods and 3 metrics to quantify their performance.
- Open-source initiative: OpenUnlearn is an open-source initiative and easily extensible.
To install the core environment dependencies of OpenUnlearn, use pip by cloning the OpenUnlearn repo into your local environment:
pip install -e . OpenUnlearn is an open-source ecosystem comprising datasets, implementations of state-of-the-art unlearning methods, evaluation metrics, and documentation to promote transparency and collaboration around evaluations of unlearning methods. OpenUnlearn can readily be used to benchmark new unlearning methods as well as incorporate them into our framework. By enabling systematic and efficient evaluation and benchmarking of existing and new unlearning methods, OpenUnlearn can inform and accelerate new research in the emerging field of Unlearning data from trained ML models.
All the unlearning methods included in OpenUnlearn are readily accessible through the Unlearner class, and users just have to specify the method name in order to invoke the appropriate method and generate new models where a specifc set of points were forgotten from these models as shown in the above code snippet. Users can easily incorporate their own custom unlearning methods into the OpenUnlearn framework by extending the Unlearner class and including the code for their methods in the get_updated_model function of this class.
# example using Gradient Ascent Unlearner
from openunlearn import Unlearner
unlearning_method = Unlearner(method='GA',model=model, dataset_tensor=inputs)
model_updated = unlearning_method.get_updated_model(train_loader, forget_loader, test_loader)The currently supported methods are:
Noisy Gradient Descent: Link to paperGradient Descent: Link to paperGradient Ascent: Link to paperNegGrad+: Link to paperSCRUB: Link to paperSSD: Link to paperEUk: Link to paperCFk: Link to paper
Benchmarking an explanation method using evaluation metrics is quite simple and the code snippet below describes how to invoke the RIS metric. Users can easily incorporate their own custom evaluation metrics into OpenUnlearn by filling a form and providing the GitHub link to their code as well as a summary of their metric. Note that the code should be in the form of a function which takes as input data instances, corresponding model predictions and their explanations, as well as OpenUnlearn’s model object and returns a numerical score.
OpenUnlearn calculates the updated model's utility on unseen test points. OpenUnlearn also provides an estimate of the time taken to update the models using the respective methods.
OpenUnlearn includes 3 metrics aimed at distinguishing whether the original model or the updated model was used for prediction: i) Threshold Membership Inference Attack (MIA), ii) LiRA-Forget and GUS. ALl metrics measures whether points from the forget set can be distinguished from test points for a given model.
Membership Inference Attack (MIA)metric computes the standard loss-based membership inference attack between forget and test outputs using the method by Shokri et al (2017).LiRA-Forget(in progress): Uses shadow models as in Pawelczyk et al (2024a) to run an unlearning audit. Can be computationally expansive. The implementation is Work in Progress.Gaussian Unlearning Score (GUS): Uses Gaussian distributed samples as in Pawelczyk et al (2024b) to mount an unlearning audit without the need to train any shadow models.
Unlearning Evaluation as a Hypothesis Testing Problem: This set of measures aims to understand the privacy risks of unlearning by casting the unlearning evaluation problem as membership hypothesis test of the following form:
-
$H_0$ : The model$f$ was trained on$S_{\text{train}}$ without$x$ (perfect unlearning /$x$ is a test poison); -
$H_1$ : The model$f$ was trained on$S_{\text{train}}$ with$x$ (imperfect unlearning / no unlearning).
If you find this code useful, please cite the corresponding works:
@article{pawelczyk2024machine,
title={{Machine Unlearning Fails to Remove Data Poisoning Attacks}},
author={Pawelczyk, Martin and Di, Jimmy Z and Lu, Yiwei and Kamath, Gautam and Sekhari, Ayush and Neel, Seth},
journal={arXiv preprint arXiv:2406.17216},
year={2024}
}
@inproceedings{pawelczyk2023context,
title={{In-Context Unlearning: Language Models as Few Shot Unlearners}},
author={Pawelczyk, Martin and Neel, Seth and Lakkaraju, Himabindu},
booktitle={International Conference on Machine Learning (ICML)},
year={2024}
}
