Case Study: How Amazon's AI Recruiting Tool “Learnt” Gender Bias

TL;DR

Amazon started building an AI resume-screening tool in 2014, training it on a decade of past applicants. Because most of those resumes came from men, the model taught itself that male candidates were preferable, downgrading resumes that mentioned "women's" and favoring male-coded verbs. By 2015 the bias was clear, fixes did not stick, and Amazon scrapped it. The lesson: biased training data produces biased hiring, so audit your data and keep humans in the loop.

“You’re hired!” When Amazon launched its AI recruiting tool way back in 2014, the company hoped that it could potentially help their HR team revolutionize hiring practices and reach their hiring verdict more efficiently. But sadly, that wasn’t meant to be. The tool quickly developed a clear gender bias which automatically limited the number of female candidates selected for the next stage - All due to a lack of strong female candidates in the training data provided to the AI model. With AI tools entering every facet of professional life, this can become a major problem for HR departments who are often required to lead the charge against any potential bias. A Gartner, Inc. survey of 179 HR leaders on January 31, 2024, revealed that 38% of HR leaders are piloting, planning implementation, or have already implemented generative AI (GenAI), up from 19% in June 2023.

At present, 38% of HR leaders are considering or have already implemented AI in human resources processes. To avoid repeating the mistakes of before, it’s vital that HR professionals understand what led to Amazon’s HR breakdown.

Key Takeaways:

AI learns everything, but only what we feed it.
Algorithmic transparency is crucial to prevent and detect AI biases.
AI output should be closely evaluated before it goes live.

How Amazon used AI to find top talent (or not)

The company created this AI tool with the objective of automating the entire resume screening process, to efficiently identify the best talent across the globe. Therefore, it trained the AI tool on resumes submitted to Amazon over a ten-year period, with a focus on those of successful candidates. The team shortlisted approximately 500 computer models, which crawled through top-performing resumes from the previous years to identify around 50,000 key terms and attributes. These were deemed to be prerequisites for specific job positions, with the tool using these terms as a criterion to identify suitable candidates based on a rating scale of 1 to 5 – similar to how products are rated on Amazon. However, the shine swiftly faded away when the company realized that the tool had imitated both the strengths and weaknesses of a manual screening process.

How did Amazon’s AI recruiting tool fail?

Reuters was the first to report on the failure of Amazon’s AI recruiting tool, where by 2015, it had become evident that the AI was not rating candidates in a gender-neutral manner.

Resumes with the word “women’s” (as in “women’s chess club captain”) were downgraded. The AI had effectively taught itself that male candidates were preferable, reflecting the male-dominated data it was trained on.

The technology also favored candidates who used verbs such as “executed” and “captured”, which were commonly found on male engineer resumes. However, this also led to unqualified candidates being recommended for roles, simply for using these words in their resumes.

Oxford University researcher Dr. Sandra Wachter says:

Amazon attempted to adjust the algorithms to be neutral but ultimately decided that the tool could not be reliably unbiased and scrapped the project.

Amazon’s response

Amazon stated that the tool “was never used by Amazon recruiters to evaluate candidates.” However, it did not deny that their recruiters had a look at the recommendations provided by the AI tool. They apparently now use a “watered down version”.

Amazon explained that women and other minority groups were not adequately encouraged in STEM, leading to less number of women applicants to such jobs. Data shows that only 27% of STEM graduates are women.

Interestingly, in the same year as the release of its AI recruitment tool, Amazon released its first-ever workplace-diversity figures, which suggested that a whopping 63% of its employees were males.

What are some key learnings from Amazon’s tool?

Training data is everything: Since AI tools are trained on specific datasets, they can pick up human biases like gender, race, nationality, etc., found in this training data. If Amazon’s training data had included a balanced ratio of male and female profiles, this bias could have been avoided. It might be prudent to have independent checks on your training data.

Account for macro factors: The Amazon tool failed to account for changing trends and policies in the recruitment industry, such as the increasing emphasis on diversity and inclusion. We’re a different world from the 1990s and the data should account for that. Focusing on specific words can also lead to a skewed result, as seen with the verbs “executed”.

Human intervention: The tool's reliance on automation led to a poor recruitment processing journey, which candidates found frustrating and unengaging. It was also limited in its ability to assess non-measurable skills and qualities, which are often critical in professional settings. This is where human intervention is key. Creating a system that balances both is vital.

Algorithmic transparency: Transparent algorithms are essential for identifying and correcting biases. Companies might be hesitant to share their data or code-base with the larger community, but this might help identify problems before it’s too late.

While Amazon’s AI fail acts as a warning story, companies across the globe have already started integrating AI into their HR tools. You’ll find a bevy of enterprise tools that are changing the face of how we approach HR - You just need to find the right one for you. Don’t agree with AI in HR? Share your thoughts with us!

Sources

Reuters, Jeffrey Dastin, “Amazon scraps secret AI recruiting tool that showed bias against women” (Oct 2018)

Frequently Asked Questions

What actually went wrong with Amazon's AI recruiting tool?

Amazon trained the tool on resumes submitted over a ten-year period, and because tech is male-dominated, most of those resumes came from men. The model learned that male candidates were preferable and started penalizing resumes that included the word "women's" while favoring male-coded verbs like "executed" and "captured." The bias came from the data, not from any rule someone wrote on purpose.

Did Amazon ever use this tool to actually reject candidates?

Amazon says the tool "was never used by Amazon recruiters to evaluate candidates," but it did not deny that recruiters looked at the AI's recommendations. The project was scrapped after Amazon concluded it could not reliably make the algorithm gender-neutral. Reuters reported the company later moved to a watered-down version for basic tasks.

Why did the AI downgrade resumes that mentioned women?

The model was rating candidates one to five stars based on patterns in past successful hires, and those hires skewed heavily male. So it treated signals correlated with women, such as "women's chess club captain" or all-women colleges, as negatives. It was not coded to discriminate; it simply mirrored the imbalance baked into its training data.

How common is AI in hiring and HR now?

It is rising fast. A Gartner survey of 179 HR leaders on January 31, 2024 found 38% were piloting, planning, or had already implemented generative AI, up from 19% in June 2023. Recruiting was one of the top use cases HR leaders said they were prioritizing, which is exactly why the Amazon failure still matters.

How do you stop an AI hiring tool from repeating Amazon's mistake?

Start with the training data: if it reflects a skewed workforce, the model will reproduce that skew, so audit it with independent checks before you trust the output. Keep humans in the loop to judge non-measurable qualities the model cannot, and push for algorithmic transparency so biases can be caught and corrected early rather than after they go live.

ShareX LinkedIn Facebook

Keep reading

The archive →

AI TOOLS

From The Cut

All takes →