5.3. ML Gone Wrong¶

But just because you can, doesn’t mean you should.

Note

Notice the company names below! These are the creme de la creme of tech firms!

ML/AI methods replicate patterns in the data by design
- If you give it data with human biases, then the AI can easily become biased. This has led to debates about how to use ML (example - ML algos fail to set cash bail without bias)
- Amazon’s engineers used ML to evaluate applicants but taught the model that males were automatically better
- Criminal sentencing based on “risk predictions” overweight race
- Online advertising - Google is more likely to serve up arrest records in searches for names assigned “primarily to black babies”
- Medical treatments vary by race
Humans are strategic and will exploit incentives created by algos and exploit the algo itself
- Facebook’s targeted ad categories initially allowed hate groups to form
- Microsoft’s chat bot was hijacked by 4chan users to teach the bot hate speech
- Uber must consider how changes to its dispatch algorithm will alter driver behavior
ML/AI tools are not always the right tool
- Zillow’s pricing model was best-in-class but still lost nearly 400m in a single quarter!
Data leakage is common and can lead to false “discoveries” of impossible performance gains
Some problems are simply hard
- Predicting stock returns is hard! The best predictive R2 for individual stocks in this paper (open access here) is just 1.80% per month.
- IBM’s Watson tried to predict cancer. How’d it go? According to internal documents: “This product is a piece of sh–.”
- Google Flu Trends consistently over-predicted flu prevalence

LeDataSciFi-2024

5.3. ML Gone Wrong¶