The Problem

Judges are constantly making decisions about whether defendants should be released or detained while awaiting a trial.

How fair are these rulings?

Human beings are easily biased, and studies have suggested that external factors (like a lunch break or a tough football loss) can sway their decisions. In the age of big data, it’s tempting to imagine that using a computer to make rulings might help counteract our own biases. In fact, “risk scores” generated by algorithms are used nationwide [1][2]. When used as black boxes, however, algorithms are no better (and maybe much worse) in the biases they propagate. A recent study of defendants in Broward County, Florida showed that Black defendants are far more likely to be assigned a high-risk score [3].

The Users

Data contains valuable information, but we need to understand how to interpret it, use it, and recognize its consequences. We believe that taking the time to understand the effects of using these risk scores with different thresholds will allow judges, lawyers and policy-makers to use data-driven models to make less biased decisions in the criminal justice system.

Current Models

Currently, the proprietary COMPAS risk assessment score is widely used in court, but evidence shows that this score is suboptimal for minorities and women, and there is currently no system to assess how to make decisions based on a score -- it is simply presented as a number to the judge.


Variables to consider when calculating a risk score




COMPAS recidivism score

COMPAS violent recidivism score

What is Fair?

“Fair” is a word we often throw around, but determining what is the most fair decision involves a lot of tricky tradeoffs to think about. We consider three types of fairness, and compare how models can be interpreted in each framework.

Equal Thresholds: Given an algorithmically-generated risk score, we say that any two people with the same risk score have the same ruling. For example, we could decide that any defendant, regardless of race, gender, or other factor, will be detained if their risk score is about 0.6.

Equal Detention Rates: Given two populations (i.e. male and female, or black and white), we want to detain an equal rate of people from both populations. This necessarily means we want different thresholds for different populations.

Equal False Positive Rates: Given two populations, we want to choose thresholds per population such that we enforce equal false positive rates (FPR = the fraction of people who did not reoffend who were detained wrongfully).