I categorize Fairness in Machine Learning in two forms: 1) Static; 2) dynamic. The topics of static fairness are about prediction and classification. The topics of dynamic fairness are more about the policy of decision making and the feedback/impact of the decision. The difference is whether the algorithm considers the impact or feedback of the decision.
In this post, I mainly introduce an oral paper in 2019 ICML “Learning Optimal Fair Policies” (I will not introduce the technical details, i.e., the “Section 4 Estimation of optimal policies in the fair world” but will only talk about the main idea and parts of the important sections).
Imagine the ingredients: 1) classic MDP setting, 2) counterfactual inference, and 3) path-specific-effect (fairness). Then you might have a rough idea about the paper. The interesting setting would be policy-maker cannot manipulate the outcome distributions/ reward, because if you have the power to directly work on the outcome, the perfect intervention on the outcome, why would you do anything (any action) else. It would also be interesting to learn how to evaluate the counterfactual evaluation.
From the MDP perspective, we might forget that using data contain bias. The decision based on the biased data might perpetuate the bias. In other words, if we are wearing a prejudice glass to make a decision, everything will be forever in a biased situation (which might not be true, refer to “social equality” paper because that depends on the dynamics/ transition probability).
However, once the user points out the potential bias way, it is possible to correct the bias. In the formulation of “path-specific-effect” fairness, once you label the biased pathways, it will output a fair world distribution which has the property that guarantees the effect of these paths within a range. In a decision-making manner / MDP manner, every distribution you are working on has been projected to the distribution of the fair world. Then, the question is that in this way, can you guarantee that the outcome distribution conjugate the fair world distribution? The answer will be given in this paper.
In the sequential decision setting, there are multiple complications. In particular, we aim to learn high-quality policies while simultaneously making sure that the joint distribution induced by the policy satisfies our fairness criteria, potentially involving constraints on multiple causal pathways. This problem must be solved in settings where distributions of some variables, such as outcomes, are not under the policy-makers control. Finally, we must show that if the learned policy is adapted to new instances (drawn from the original observed distribution) in the right way, then these new instances combined with the learned policy, constrained variables, and variables outside our control, together form a joint distribution where our fairness criteria remain satisfied.
Learning Optimal Fair Policies
“We aim to learn high-quality policies”: This is the typical task of RL/ MDP, i.e., given of the basic elements of MDP: <state, action, transition matrix, reward, decay>, provide an optimal policy which maximizing the expectation of the weighted rewards.
“while simultaneously making sure that the joint distribution induced by the policy satisfies our fairness criteria“: The fairness criteria is defined in the sense of the joint distribution of all the variables in the given causal graph. To note, there are two joint distributions here: 1) observed distribution, which must contain bias; 2) the corrected joint distribution called the joint distribution from a fair world. The fair world is defined by user-specified with the causal graph. Hence, satisfying their fairness criteria means that the related joint distribution is consistent with the fair world.
“involving constraints on multiple causal pathways“: Referring to the paper introducing path-specific-fairness, the causal pathways are the constraint of biased paths. With such pathways, the fair world and the joint distributions are provided.
“Finally, we must show that if the learned policy is adapted to new instances (drawn from the original observed distribution) in the right way, then these new instances combined with the learned policy, constrained variables, and variables outside our control, together form a joint distribution where our fairness criteria remain satisfied.”: we could consider the output distribution depends on the policy and the current distribution, and the task is to guarantee that the output distribution is also consistent with the fair world, i.e., when a new instance from the observed biased world’s distribution comes.
Finally, congratulate the authors of the paper and thank them for contributing a great work for the fairness in a dynamic manner.
No comments:
Post a Comment