LLM Bias in Hiring: Why AI Prefers Its Own Writing

A new paper making the rounds on Hacker News puts hard numbers on something AI users have suspected for a while: large language models really do play favorites with their own writing, and the effect is now spilling into hiring decisions.

The research, titled “AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights” by Jiannan Xu and co-authors, ran a large-scale resume correspondence experiment to see whether LLMs systematically prefer content generated by themselves. According to the Hacker News submission (which pulled 177 points), the answer is yes, and the gap is wide enough to skew real labor market outcomes.

What the researchers actually did

The team built a controlled experiment that mirrors the way hiring works today. Job seekers often polish resumes with ChatGPT, Claude, or Gemini. Employers, on the other side of the desk, increasingly use the same models to screen those resumes. The researchers fed evaluator LLMs three flavors of resume for the same candidate profile:

Human-written resumes
Resumes refined by the same model doing the screening
Resumes refined by a different LLM

Content quality was held constant. Then they simulated realistic hiring pipelines across 24 occupations to see who got shortlisted.

The numbers

Metric	Range
Bias against human-written resumes	67% to 82%
Shortlist boost for same-LLM applicants	23% to 60%
Bias reduction from simple interventions	50%+

The disadvantage hit hardest in business-heavy fields like sales and accounting. So a qualified human applicant who wrote their own resume could lose out to an equally qualified candidate who happened to use the same model the employer deployed for screening.

Why this matters for practitioners

This is a new flavor of algorithmic bias, and it doesn’t fit the usual fairness frameworks built around demographic disparities. The researchers call it AI-AI bias, and it has practical implications for three groups:

Job seekers: If you’re applying somewhere that screens with AI, the model that touched your resume may matter as much as the content. Multi-model polishing or knowing the employer’s stack could become a real edge.
Employers: Anyone running an LLM-based screener is probably tilting the playing field toward candidates whose tools match yours, without intending to. That’s a legal and reputational risk worth auditing.
AI builders: Self-recognition is doing more work in model behavior than most teams account for. Bias mitigation needs to extend beyond demographics into model-vs-model dynamics.

What stands out is the intervention finding. The authors showed that simple prompts targeting the model’s self-recognition capabilities cut the bias by more than half. That’s a cheap fix for vendors and HR teams to test right now, no retraining required.

The limitations to keep in mind

The authors acknowledge a few caveats. The hiring pipelines were simulated, not run inside live ATS systems at real companies. Resume content quality was controlled in the experiment, which strips out some of the messiness of actual job applications. And the 24 occupations sampled, while broad, don’t cover every labor market segment.

Still, the consistency across commercial and open-source models is the part that’s hard to dismiss. When the same pattern appears in GPT-class models, Claude, and open-weights alternatives, it’s not an artifact of one vendor’s training pipeline. It’s a structural feature of how current LLMs evaluate text.

What comes next

This research lands at a moment when AI is sitting on both sides of more and more decisions: hiring, content moderation, code review, grant evaluation, even peer review of academic papers. If self-preference bias generalizes beyond resumes, the implications get uncomfortable fast. Expect AI fairness frameworks to expand into AI-AI interactions over the next year, and expect smart job seekers to start treating their resume-polishing tool as a strategic choice.

Full paper and methodology details are available at the original source.

Read original article

What the researchers actually did

The numbers

Why this matters for practitioners

The limitations to keep in mind

What comes next

Related: