Your PR History Is a Training Dataset

I love the idea that a team's code review history is a dataset. Hundreds of PRs, thousands of comments, years of accumulated judgment about what matters and what doesn't. The problem is that all of that knowledge lives scattered across PR comment threads, largely invisible to anyone who wasn't in the conversation.

AI tools can already do code review, but it's generic. It doesn't know what your team cares about. It doesn't know your architecture, your deployment quirks, or the bugs you've already been burned by.

Building a Team-Aware Reviewer

After getting some particularly valuable PR feedback from teammates, I wanted to see if I could capture what our team actually values when reviewing our codebase, not what a generalized model thinks good code looks like.

It started small. I pointed Claude at a few PRs with strong review feedback and asked it to analyze the patterns: what did reviewers care about, and why? From that analysis, it generated a skill that predicts likely reviewer feedback on a diff before submission. I tested it for a week and the results were qualitatively different from generic AI review. Instead of boilerplate suggestions, Claude was catching the same things my teammates would flag: logic in the wrong architectural layer, missing edge cases the team has been burned by before, naming inconsistencies that break established conventions.

It wasn't perfect. But it was useful enough that I started running it on every PR before requesting review.

From there I widened the scope. Claude analyzed ~500 merged PRs and extracted a prioritized taxonomy of 17 review categories, ranked by how frequently they appear in our team's actual feedback. The ranking reflects what the team actually catches, not what we think we should care about.

Closing the Loop

Static checklists go stale. So I built a second skill that tells Claude how to rebuild the first one. It fetches recent merged PRs, extracts substantive review comments, categorizes them against the existing taxonomy, identifies new patterns, flags stale ones, and proposes updates. The patterns are team knowledge, not individual voices (reviewer attribution is always stripped).

Together, the two skills form a feedback loop: real reviews train the prediction skill, predictions catch issues before review, and periodic refreshes keep patterns current as the codebase evolves.

Replicating This on Any Team

The approach generalizes to any team with a review culture. Point Claude at your team's merged PRs, ask it to analyze the review patterns, and have it generate a prediction skill. Build a refresh skill so the patterns stay current. The skills are just markdown files in your repo, picked up automatically by their AI tool of choice. No infrastructure, no deployment, no maintenance burden.

The raw material is already sitting in your PR history.