Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
FATE Machine Learning
This week… • Fairness • Definition of Fairness • Confounding • Transparency and Explainability • Shapley Values Lecture will cover… Read: The Concept of Fairness in Machine Learning Now that AI systems are all over the place, people have started worrying about how to make systems Fair, Accountable, Transparent, and Explainable (FATE). These concepts are related and work together. When studying how to make a system fair, we need to take these factors into consideration. It is extremely hard to make machine learning models fair. Furthermore, these concepts are sometimes contradictory! What is Fairness in ML? Demographic disparities are common in the world. ◦ Different neighborhoods have different prices. ◦ Different cultures have different expectations / needs. ◦ Different people prefer different things. Machine learning models will take this information and build a model that hard codes these disparities. The question then becomes, are these encoded differences justified or are they adversely discriminatory. This is the fairness problem Fairness is a Social Construct One important factor to understand is that fairness is built from a social agreement. ◦ What’s fair is an ethics (philosophy) and society (sociology) problem. ◦ Machine learning models are empiric, or evidence-based. But if the original data is (adversely) biased, so will your models! Fairness is then a complex problem with multiple definitions, and currently there is no agreement on what exactly is a fair solution. ◦ Depends on the problem. ◦ Depends on the data. ◦ Depends on the society. ◦ Depends on the legal framework. This is bound to evolve! If you work on ML keep a close eye on this. Confounding Fairness Through Unawareness Before the serious research that has been put into fairness, the preferred method for regulators used to be fairness through unawareness. In simple words, fairness through unawareness is… not using the variable in the model ◦ Remove protected characteristics from the charter of human rights. ◦ It varies by jurisdiction! ◦ In Canada: Citizenship, race, place of origin, ethnic origin, colour, ancestry, disability, age, creed, sex/pregnancy, family status, marital status, sexual orientation, gender identity, gender expression, receipt of public assistance (in housing) and record of offences (in employment) (Canadian Human Rights Act) The Problem: Confounding The problem with Fairness Through Unawareness is that the characteristic can be a confounding factor of the variables in the model. This means we are still discriminating by that variable. Confounding: Example Confounding occurs when two variables appear to be related, but only because they are both related to another (unseen) variable. Assuming a model with an output , a variable and a confounding factor : Formally Defining Fairness Formally Defining Fairness Researchers have tried to formally define fairness for many years. There is still not an accepted definition so I will use the one from the FairML book (Barocas et al., 2021). This may evolve in the future. We consider: ◦ A set of attributes that are sensitive (think any characteristic protected by the Canadian Human Rights Act, see later slides). ◦ A target variable which represents what we want to predict. ◦ A set of predictors which include variables that we use to create a model (score) = [|]. For example: a credit score uses a logistic regression using sociodemographic variables where the target is a binary variable representing whether the person defaulted (1) or not (0). Independence A model shows Independence w.r.t. the set of attributes if: ⊥ where ⊥ is the symbol representing statistical independence, that is, the scores produced by the model are not affected by the value of the set of attributes . For the binary case: = 1 = = = 0 = ′ ∀, ′ ∈ Independence is referred in the literature also as demographic parity, statistical parity, group fairness, disparate impact and several other names. It is very popular in the literature! Separation What happens if the target variable is known to be affected (correlated or endogenous) to the attributes ? In this case, it may be desirable to allow correlation between the two only to the value that we observe. We call this criteria separation. Formally: ⊥ | For the binary case: = 1 = 1, = = = 1 = 1, = ′ ∀, ′ ∈ = 1 = 0, = = = 1 = 0, = ′ ∀, ′ ∈ Sufficiency: Calibration The third definition we will study is the concept of sufficiency. In order to understand it better we will first defined the concept of calibration of a model. We say a model is calibrated, if all predictions in the support of are equal to the probability the underlying events occur. So, for the binary case: = 1 = = This means the model is actually giving the probabilities observed in the data. If the model says 25%, then the set of all cases with a prediction of 25% has a 25% positive rate ☺ Calibration is normally a postprocessing tool, where we adjust the probabilities to achieve it (see later). Sufficiency: Calibration by group Now let’s add the sensitive feature. We say a model is calibrated by group if (again for the binary case, all of these can be extended to multiple classes/regression): = 1 = , = = ∀ ∈ (), ∈ This means that the positive rates are maintained when segmenting by score and the sensitive attribute. With this, we can finally define sufficiency. A model satisfies sufficiency if ⊥ | And its relation to calibration by group is that, for any model that satisfies sufficiency there is a function : 0, 1 → [0, 1] that satisfies calibration by group. The Problem with Fairness Now that have a formal definition of fairness, we can get to the biggest problem of all: IT IS IMPOSSIBLE TO SATISFY ALL THREE CRITERIA AT ONCE This is the biggest issue in fairness. Trade-offs must be made in order to achieve whatever the modeller considers fair. ◦ Independence and sufficiency are mutually exclusive. ◦ If is binary, independence and separation are also mutually exclusive. If the target is not binary, then you can obtain a model that achieves both. ◦ If is not independent of , and is binary with non-zero false positive rate, then separation and sufficiency are mutually exclusive. Also: if unobserved confounding factors exist, these criteria may not even be measured! ◦ The importance of measuring what you want to control for. Reputational risk? Treatment of Models How can we treat models to achieve fairness? Confounding or unfairness in a model are serious issues that are often neglected when studying machine learning systems. They are however much better understood today. I will venture it will be a legal requirement in the near future. In general, there are several ways to treat models: ◦ At pre-processing. ◦ During model construction. ◦ At post-processing. At pre-treatment This requires to study the variables before constructing the model and modifying them so that they are uncorrelated with . It is very hard to do with a large number of predictors! Constrained and kernel PCA methods can be used to create these reductions. You may lose information! During Model Construction A second option is to directly include fairness criteria into the models. This assumes the fairness criteria are: ◦ Immutable. ◦ Known in advance. This may not be the case! Requires modifying the cost function. After Model Construction One way to solve the problem is to directly add the set of attributes to the model and interact the values with other variables. ◦ Generate a model with the confounding factor and all other variables. ◦ Estimating the prediction of the model with all possible values for the confounding factor. ◦ Return an average of all values and use that. This leads to an estimator that shows independence. Think whether this is enough for you! Another alternative is to choose cut-off points to achieve separation. Assuming the score is built, choose different cutoff points for different values of so that the criteria is achieved. ◦ Use the ROC curve for this! (If binary)
The core principle of the AIA is that AI must be trustworthy. This means it must be ◦ Lawful, ◦ Ethical, and ◦ Technically robust. Furthermore, the AIA divides the systems into different levels of risk. EU AIA: Levels of Risk The AIA divides ML applications into forbidden ones (i.e. real time biometrics, social scoring algorithms, manipulative systems), high-risk ones and low risk ones. High-risk applications include: ◦ Biometric identification and categorization of natural persons. ◦ Management and operation of critical infrastructure. ◦ Education and vocational training. ◦ Employment and worker management. ◦ Access to essential services: private and public services access, including financial services! ◦ Law enforcement / border control. ◦ Administration of justice. Low risk is anything that does not fit above. If an application is high risk then extended penalties apply for misuse of AI! ◦ Penalties are huge. 6% of annual turnover or up to 30 million euros. capAI: What it covers capAI: Process Transparency and Explainability Transparency and Explainability The final concept we will cover is the problem of Transparency and Explainability. The OECD refers to this concept as follows: AI Actors should commit to transparency and responsible disclosure regarding AI systems. To this end, they should provide meaningful information, appropriate to the context, and consistent with the state of art: ◦ to foster a general understanding of AI systems, ◦ to make stakeholders aware of their interactions with AI systems, including in the workplace, ◦ to enable those affected by an AI system to understand the outcome, and, ◦ to enable those adversely affected by an AI system to challenge its outcome based on plain and easy-to- understand information on the factors, and the logic that served as the basis for the prediction, recommendation or decision. Transparency: AIA requirement The AIA requires, for every high-risk model in use (and also suggested for low risk ones), submission to an EU-wide database of models, currently under construction. The organization must submit the following: ◦ Company information. ◦ Status of the AI model. ◦ Description of AI model purpose. ◦ Where is the system being used. ◦ Electronic instructions for use of the model. ◦ Optionally, an external scorecard for details re. model use.
What about explainability? Explainability refers to the ability to interpret the model outputs and understand its relationship to the predictions. We make the difference between black box and white box models. • A white box model will have the explanation to the patterns directly on the outputs of the model. • Decision trees, GLMs in general, etc. • A black box model will not have them. • Neural networks, XGB, Random Forest, etc. In general, non-linear models with complex patterns will normally be black box. • Can we make them more explainable? • We’ll study a few ways specifically for tree-based ensembles. Variable Importance Plots One way of studying the impact of specific inputs in tree-based ensembles is via variable importance plots. These plots show the statistical impact of the variables in the model (as measured by the Gini index). These plots however do not provide any sort of explanation in terms of individual cases. ◦ As they are non-linear, different cases can be affected differently. There is one more alternative to this: Shapley Values The Shapley Values The Shapley Values are a well-known measure in financial modelling. It uses a game-theoretic approach to provide explainability. Assume each variable ∈ is a player in a game. A model , ∈ ℝ generates a prediction (the payout). How can we distribute the payout fairly across all features ? The Shapley Values is a proportion between the marginal contribution of the variable to a subset of variables divided by the number of variables in that subset, summed so that all possible combinations of variables are considered. The problem: Calculating this is NP-hard! TreeSHAP A few years ago Lundberg et al. (2019) realized that for tree-based methods it is much easier to calculate these contributions. ◦ As tree-based models calculate subsets of variables directly, we can calculate the Shapley Values over tree cuts. ◦ This is MUCH faster! Polynomial instead ((3) with the number of examples) It also maintains, generally, the properties of Shapley Values. ◦ Local additivity: The Shapley Value of a subset of values is the sum of the values of each member of the subset. ◦ Consistency/monotonicity: The importance of a set of values is larger than the importance of a smaller subset of values that includes all of the original ones. ◦ Missingness: If an attribute importance is zero for all subsets, its Shapley Value will be zero. Let’s use them in the lab! Takeaways Fairness, Accountability, Explainability and Transparency is a huge world that is just being developed.