You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: intro.qmd
+40-12Lines changed: 40 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -15,22 +15,17 @@ To start with, we will look at a proof-of-concept that demonstrates the main obs
15
15
16
16
We begin by generating the synthetic data for a simple binary classification problem. For illustrative purposes, we will use data that is linearly separable. The chart below shows the data $\mathcal{D}$ at time zero, before any implementation of recourse.
\caption{Dynamics in Algorithmic Recourse: (a) we have a simple linear classifier trained for binary classification where samples from the negative class ($y=0$) are marked in blue and samples of the positive class ($y=1$) are marked in orange; (b) the implementation of AR for a random subset of individuals leads to a noticable domain shift; (c) as the classifier is retrained we observe a corresponding model shift; (d) as this process is repeated, the decision boundary moves away from the target class.}\label{fig:poc}
660
+
\caption{Dynamics in Algorithmic Recourse: (a) we have a simple linear classifier trained for binary classification where samples from the negative class ($y=0$) are marked in orange and samples of the positive class ($y=1$) are marked in blue; (b) the implementation of AR for a random subset of individuals leads to a noticeable domain shift; (c) as the classifier is retrained we observe a corresponding model shift; (d) as this process is repeated, the decision boundary moves away from the target class.}\label{fig:poc}
661
661
\end{figure}
662
662
663
-
We think that these types of endogenous dynamics may be problematic and deserve our attention. From a purely technical perspective we note the following: firstly, model shifts may inadvertently change classification outcomes for individuals who never received and implemented recourse. Secondly, we observe in Figure \ref{fig:poc} that as the decision boundary moves in the direction of the non-target class, counterfactual paths become shorter. We think that in some practical applications, this can be expected to generate costs for involved stakeholders. To follow our argument, consider the following two examples:
663
+
We think that these types of endogenous dynamics may be problematic and deserve our attention. From a purely technical perspective, we note the following: firstly, model shifts may inadvertently change classification outcomes for individuals who never received and implemented recourse. Secondly, we observe in Figure \ref{fig:poc} that as the decision boundary moves in the direction of the non-target class, counterfactual paths become shorter. We think that in some practical applications, this can be expected to generate costs for involved stakeholders. To follow our argument, consider the following two examples:
664
664
665
665
\begin{example}[Consumer Credit]
666
-
\protect\hypertarget{exm:consumer}{}\label{exm:consumer}Suppose Figure \ref{fig:poc} relates to an automated decision-making system used by a retail bank to evaluate credit applicants with respect to their creditworthiness. Assume that the two features are meaningful in the sense that creditworthiness increases in the south-east direction. Then we can think of the outcome in panel (d) as representing a situation where the bank supplies credit to more borrowers (orange), but these borrowers are on average less creditworthy and more of them can be expected to default on their loan. This represents a cost to the retail bank.
666
+
\protect\hypertarget{exm:consumer}{}\label{exm:consumer}Suppose Figure \ref{fig:poc} relates to an automated decision-making system used by a retail bank to evaluate credit applicants with respect to their creditworthiness. Assume that the two features are meaningful in the sense that creditworthiness increases in the South-East direction. Then we can think of the outcome in panel (d) as representing a situation where the bank supplies credit to more borrowers (orange), but these borrowers are on average less creditworthy and more of them can be expected to default on their loan. This represents a cost to the retail bank.
667
667
\end{example}
668
668
669
669
\begin{example}[Student Admission]
670
-
\protect\hypertarget{exm:student}{}\label{exm:student}Suppose Figure \ref{fig:poc} relates to an automated decision-making system used by a university in its student admission process. Assume that the two features are meaningful in the sense that the likelihood of students completing their degree increases in the south-east direction. Then we can think of the outcome in panel (b) as representing a situation where more students are admitted to university (orange), but they are more likely to fail their degree than students that were admitted in previous years. The university admission committee catches on to this and suspends its efforts to offer Algorithmic Recourse. This represents an opportunity cost to future student applicants, that may have derived utility from being offered recourse.
670
+
\protect\hypertarget{exm:student}{}\label{exm:student}Suppose Figure \ref{fig:poc} relates to an automated decision-making system used by a university in its student admission process. Assume that the two features are meaningful in the sense that the likelihood of students completing their degree increases in the South-East direction. Then we can think of the outcome in panel (b) as representing a situation where more students are admitted to university (orange), but they are more likely to fail their degree than students that were admitted in previous years. The university admission committee catches on to this and suspends its efforts to offer Algorithmic Recourse. This represents an opportunity cost to future student applicants, that may have derived utility from being offered recourse.
671
671
\end{example}
672
672
673
673
Both examples are exaggerated simplifications of potential real-world scenarios, but they serve to illustrate the point that recourse for one single individual may exert negative externalities on other individuals.
where \(X=\{x_1,...,x_m\}\), \(\tilde{X}=\{\tilde{x}_1,...,\tilde{x}_n\}\) represent independent and identically distributed samples drawn from probability distributions \(\mathcal{X}\) and \(\mathcal{\tilde{X}}\) respectively \protect\hyperlink{ref-gretton2012kernel}{{[}25{]}}. MMD is a measure of the distance between the kernel mean embeddings of \(\mathcal{X}\) and \(\mathcal{\tilde{X}}\) in a Reproducing Kernel Hilbert Space, \(\mathcal{H}\) \protect\hyperlink{ref-berlinet2011reproducing}{{[}26{]}}. An important consideration is the choice of the kernel function \(k(\cdot,\cdot)\). In our implementation we make use of a Gaussian kernel with a constant length-scale parameter of \(0.5\). As the Gaussian kernel captures all moments of distributions \(\mathcal{X}\) and \(\mathcal{\tilde{X}}\), we have that \(MMD(X,\tilde{X})=0\) if and only if \(X=\tilde{X}\). Conversely, larger values \(MMD(X,\tilde{X})>0\) indicate that it is more likely that \(\mathcal{X}\) and \(\mathcal{\tilde{X}}\) are different distributions. In our context, large values therefore indicate that a domain shift indeed seems to have occurred.
824
+
where \(X=\{x_1,...,x_m\}\), \(\tilde{X}=\{\tilde{x}_1,...,\tilde{x}_n\}\) represent independent and identically distributed samples drawn from probability distributions \(\mathcal{X}\) and \(\mathcal{\tilde{X}}\) respectively \protect\hyperlink{ref-gretton2012kernel}{{[}25{]}}. MMD is a measure of the distance between the kernel mean embeddings of \(\mathcal{X}\) and \(\mathcal{\tilde{X}}\) in a Reproducing Kernel Hilbert Space, \(\mathcal{H}\) \protect\hyperlink{ref-berlinet2011reproducing}{{[}26{]}}. An important consideration is the choice of the kernel function \(k(\cdot,\cdot)\). In our implementation, we make use of a Gaussian kernel with a constant length-scale parameter of \(0.5\). As the Gaussian kernel captures all moments of distributions \(\mathcal{X}\) and \(\mathcal{\tilde{X}}\), we have that \(MMD(X,\tilde{X})=0\) if and only if \(X=\tilde{X}\). Conversely, larger values \(MMD(X,\tilde{X})>0\) indicate that it is more likely that \(\mathcal{X}\) and \(\mathcal{\tilde{X}}\) are different distributions. In our context, large values, therefore, indicate that a domain shift indeed seems to have occurred.
825
825
826
826
To assess the statistical significance of the observed shifts under the null hypothesis that samples \(X\) and \(\tilde{X}\) were drawn from the same probability distribution, we follow \protect\hyperlink{ref-arcones1992bootstrap}{{[}27{]}}. To that end, we combine the two samples and generate a large number of permutations of \(X + \tilde{X}\). Then, we split the permuted data into two new samples \(X^\prime\) and \(\tilde{X}^\prime\) having the same size as the original samples. Then under the null hypothesis, we should have that \(MMD(X^\prime,\tilde{X}^\prime)\) be approximately equal to \(MMD(X,\tilde{X})\). The corresponding \(p\)-value can then be calculated by counting how these two quantities are not equal.
Copy file name to clipboardExpand all lines: paper/sections/empirical_2.rmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ Below, we first present our main experimental findings regarding these questions
6
6
7
7
We start this section off with the key high-level observations. Across all datasets (synthetic and real), classifiers and counterfactual generators we observe either most or all of the following dynamics at varying degrees:
8
8
9
-
- Statistically significant domain and model shift as measured by MMD.
9
+
- Statistically significant domain and model shifts as measured by MMD.
10
10
- A deterioration in out-of-sample model performance as measured by the F-Score evaluated on a test sample. In many cases this drop in performance is substantial.
11
11
- Significant perturbations to the model parameters as well as an increase in the model's decisiveness.
12
12
- Disagreement between the original and retrained model, in some cases large.
0 commit comments