AI generated pull requests overwhelming, hard to review carefully

Niels_Dekker · March 9, 2026, 1:04pm

The current stream of AI generated (or AI aided) pull requests is a bit overwhelming to me. It is hard for me to review them carefully.

In general, I try to avoid reviewing any pull request that proposes changing thousands of lines of code, unless the topic of the PR has my special interest. Moreover, I wouldn’t review so many pull requests per day. However, AI can produce an enormous amount of pull requests in no time, changing thousands of lines of code.

Quoting The Register:

The burden of AI-generated code contributions – known as pull requests among developers using the Git version control system – has become a major problem for open source maintainers. Evaluating lengthy, high-volume, often low-quality submissions from AI bots takes time that maintainers, often volunteers, would rather spend on other tasks.

How do you feel?

dzenanz · March 9, 2026, 1:54pm

Prompting AI to make the changes is part of the work. The other part is reviewing those changes. If a small number of changes was produced, then diagnosing and prompting the AI was a larger part of the task, and reviewing smaller. If the number of changes is large, reviewing becomes the dominant part of the work. With refactoring-type changes, this is common.

I think that a contributor should review their AI’s output, before proposing it as a PR. Otherwise it mostly amounts to “I would like these sort of changes made, please check whether they are correct”. To take a quote from this article: “AI-generated code requires more careful review than human-written code. Every line is suspect.”

hjmjohnson · March 9, 2026, 3:13pm

I mostly agree. My experimentation with AI code generation this weekend was mostly a failure (from a human effort perspective).

I wanted to see how well AI could assist with converting argument-free ctests to Google tests, and document the approach. I wanted to do what all the news outlets are stating as facts, and was hoping to benefit ITK in the process.

My efforts identified 846 ctests that should/could be converted to Google tests. I chose to convert only the common directory (about 36 tests), and it is clear that managing this relatively simple mechanical task is too burdensome.

I was hoping for an initial conversion that was essentially “no worse” than what was there, but with the benefit of being GTests instead of ctests (which are much easier to work with in the IDE). I was impressed by how well the conversions went; the test introduced no new failures and added new tests. I think, on the whole, the results were slightly better after conversion than before.

HOWEVER:

Conversion puts eyes on the code, and then the community often desires that long-standing shortcomings in the original code be fixed at the same time, while also making sure that only one change is made at a time.
Overburdening (computational limitations) the already extensive CI testing infrastructure, and clogging up other people’s efforts.
Balancing how much to bundle into a single PR is really hard to do in a semi-automated way. Keeping each change in a separate PR creates merge conflict hell, leading back to #2 above.
Competing interests: make minimal changes, but include all fixes to all identified shortcomings at the same time.

==============================

My observations as I train the next generation of developers.

This failed experiment has given me a lot to think about. While AI is really good at handling much of the grunt work for housekeeping tasks, there is a lot more work for the experts with comprehensive knowledge of the system to review and address.

Perhaps for a project that can live with changes that evolve quickly and tolerate small regressions buried in large batches of improvements, a bulk AI conversion is an interesting consideration. For a project with very high requirements for near-perfect commits, there will be a lot of burden in addressing PRs. A bigger concern is likely the dissatisfaction of the small number of active developers who are the gatekeepers of each PR, being overburdened trying to review each PR at the highest level.

matt.mccormick · March 9, 2026, 4:00pm

I am emphathetic with Neils’ expression of the critical need to avoid maintainer overwhelm.

And while AI can reduce maintainer burden (@blowekamp 's new third-party update skills are a good example of this!) we do have to be careful how it is approach. As Hans noted, the experienced and thoughful input of developers who have a perspective of the historical reasons for things, project goals, and architectual designs and the need to train the next generation of developers are critical.

As we now have arrived in the age of “agentic engineering,” processes that support high quality development are as important as ever, e.g. fast, effective, and thorough CI testing.

Our AGENTS.md is a good start at helping the agents follow our coding style for consistency, ensuring our test coverage, etc. We should continuely improve it so the agents get closer to the results that we want on the initial pass.

We can also help reduce review burden with more AI agents :-). The summary and first pass of review agents, while not perfect (and we should not expect it to be), is very helpful at providing an overview and first pass at identifying issues. We could auto-enable GitHub Copilot review on PR’s. What are folks thoughts on this?

blowekamp · March 9, 2026, 4:20pm

It seems like it has been a learning experience for you and the community. In that way it is a success.

There have been a couple cases of PRs, where the agents review had some good comments that I was able to address or get AI to fetch and address. However, I believe it was the initial PR I and with the CMake Module Interface work, where it was not helpful. It left a good number of small detailed comments, but was not helpful with the cmake architectural review. And further more I wasn’t able to dismiss its miss-conceptions, and they resurfaced in the next round of reviews. I didn’t think it was a good use of time to respond to the AI in this situation.

I think the AI reviews can likely do the details, style best practices more easily than the higher level and design issues. The latter are reasons I sometimes make a PR earlier to see if there is agreement that it is a good things to do, and it is a reasonable approach.

dzenanz · March 9, 2026, 4:24pm

@hjmjohnson It would be good if you could take a look at Agents.md and update it, while all this is still fresh in your head.

matt.mccormick · March 9, 2026, 4:33pm

I have also found that the comments are sometimes but not always helpful, and I have observed their quality to improve significantly over time.

According to GitHub, they have been improving the review agent so it will only bring up new comments in subsequent reviews.

Niels_Dekker · March 10, 2026, 10:17am

Thanks for binging up the topic of Copilot reviews!

I see that AI generated reviews can be helpful, just like compiler warnings can help us to prevent mistakes. I don’t think we should process an AI review the same way we would process a human review. The human review process is not just technical, it’s also a social interaction. Obviously a human review should always be treated in a friendly and polite way. And of course, a human reviewer may be unhappy when their comment is ignored. What about a Copilot review? Do we always need to reply to its suggestions in a friendly way? Do we need to “defend” our proposed change against AI generated criticism?

I feel that we shouldn’t put too much weight on an AI generated review. We shouldn’t feel bound to address all of its comments.

Of course, an AI generated comment will become more relevant when it is supported by a human reviewer, by a “like” or a follow-up comment.

I’m not sure about auto-enabling GitHub Copilot review on PR’s. It takes away the human interaction of requesting a Copilot review on a PR. Is that a good thing or not? When I try to address a Copilot review, it sometimes makes me wonder, who am I doing it for? I don’t want to have the feeling that I’m just trying to please a robot When someone has actively requested a Copilot review on my PR, it’s clearer to me that addressing the review may also please a human being

Of course, there are also environmental costs to the use of AI. AI is known to take lots of energy. I don’t know how that compares to our regular CI, for example. Just something to keep in mind.

matt.mccormick · March 10, 2026, 4:06pm

Absolutely loving this thread. Thanks for kicking off such a thoughtful discussion

I very much agree that the human side of our collaboration is paramount, and deserves explicit protection and celebration. Code review and issues are a big part of how we build trust, mentorship, and shared ownership in ITK, and I’d really like to keep those interactions social, relational, and human-to-human, with AI as background tooling rather than a “participant” in the conversation.

I’m also very guilty myself of anthropomorphizing AI agents. It’s so easy to slip into that because the “API” is natural language, but the reviews these systems give are still rule-driven computational models, reflecting patterns from their training data and our existing codebases, not intentional human judgement. We do need to guard against treating them as people; they don’t need us to be “friendly” in the social sense (though there’s no reason to be rude either!), and we shouldn’t let politeness to a tool dilute the clarity of our technical decisions.

For that reason, I think it’s helpful to mentally file AI review agents alongside linters, compilers, and integration tests. They’re extremely useful, often catch real issues, and can make us more productive and improve code quality. But, just like any other tool, their feedback will never be 100% correct, and occasional false positives or misunderstandings are expected, not a crisis. We should treat AI comments as strong hints or hypotheses to evaluate, not as authoritative verdicts.

We should also stay mindful of the energy cost of all this. Running heavy models on every commit has a real environmental and financial footprint, so designing our workflows to get high value from AI per unit of compute (e.g., scoping when and how we trigger reviews) feels important.

On the positive side, the quality and efficiency of AI code review is moving quickly. Claude Code just today launched a dedicated multi‑agent Code Review system that automatically reviews each PR and leaves inline comments where it finds likely issues, modeled on Anthropic’s internal workflows for nearly every PR. GitHub Copilot’s code review features are also maturing, with Copilot acting as a reviewer that can leave comments and even help implement changes via follow‑up actions. And there’s a broader ecosystem that’s been evolving for years.

That diversity of input is valuable, including among AI tools themselves. I can imagine a future where, for well‑understood parts of the ITK codebase and workflows that the team is comfortable with, it might be reasonable to auto‑enable AI reviews by default because we trust both the tools and our patterns for interpreting them. But even if/when we get there, I’d still want human reviewers at the center. AI provides instrumentation and safety rails. Humans do the actual collaboration and make the final call.

matt.mccormick · March 19, 2026, 9:52pm

We had an extended discussion on AI to help developers with code reviews at the most recent Insight Software Consortortium Council meeting. We have enabled Greptile as an experiment. The reviews are starting to come in . As we gain more experience with it, which will take some time, it will be interesting to get more feedback.

blowekamp · March 20, 2026, 9:08pm

I have made a draft pull request regarding making PR, and responsibilities of humans and agents making PRs, along with some expectations of when a reviewer should looks at a PR.

github.com/InsightSoftwareConsortium/ITK

DOC: Add Create a Pull Request section with draft PR guidance (#5975)

main ← blowekamp:docs_pr_guidelines

opened 09:06PM - 20 Mar 26 UTC

blowekamp

+28 -2

## PR Checklist - [ ] No [API changes](https://github.com/InsightSoftwareConsor…tium/ITK/blob/main/CONTRIBUTING.md#breaking-changes) were made (or the changes have been approved) - [ ] No [major design changes](https://github.com/InsightSoftwareConsortium/ITK/blob/main/CONTRIBUTING.md#design-changes) were made (or the changes have been approved) - [ ] Added test (or behavior not changed) - [ ] Updated API documentation (or API not changed) - [ ] Added [license](https://github.com/InsightSoftwareConsortium/ITK/blob/main/Utilities/KWStyle/ITKHeader.h) to new files (if any) - [ ] Added Python wrapping to new files (if any) as described in [ITK Software Guide](https://itk.org/ItkSoftwareGuide.pdf) Section 9.5 - [ ] Added [ITK examples](https://github.com/InsightSoftwareConsortium/ITKSphinxExamples) for all new major features (if any) Refer to the [ITK Software Guide](https://itk.org/ItkSoftwareGuide.pdf) for further development details if necessary.

Niels_Dekker · March 21, 2026, 10:19am

DOC: Add Create a Pull Request section with draft PR guidance by blowekamp · Pull Request #5975 · InsightSoftwareConsortium/ITK · GitHub

Thanks @blowekamp , really cool!

I think it’s important that we keep understanding our own code, even when it is generated and reviewed with the help of AI. Let’s prevent cognitive debt !

The term “cognitive debt” is explained by Imran Gardezi in a YouTube (10 minutes), that I would very much recommend: AI Cognitive Debt: The Crisis Nobody Sees Coming (Thanks for the link to my LKEB/LUMC colleague Baldur van Lew.) Quoting Imran Gardezi:

“Cognitive debt accumulates when AI writes code that humans lose shared understanding of”

The term was introduced in the context of AI generated code by Margaret-Anne Storey. It appears to be a much bigger threat than the traditional technical debt. The following picture from her article How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt illustrates the difference between technical debt and cognitive debt:

Careful human code review is essential to avoid cognitive debt.

hjmjohnson · April 14, 2026, 11:43am

Following up on this discussion — particularly @Niels_Dekker’s point about cognitive debt and @blowekamp’s PR #5975 on draft PR guidelines — I’ve opened an issue proposing formal commit message attribution rules for AI-assisted contributions:

github.com/InsightSoftwareConsortium/ITK

DOC: Define commit message attribution rules for AI-assisted contributions

opened 11:03AM - 14 Apr 26 UTC

hjmjohnson

Propose formal attribution rules for commit messages when AI tools assist with I…TK contributions. **Core principles:** - `Co-Authored-By:` is reserved for humans whose intellectual contribution materially shaped the commit - AI tools never receive `Co-Authored-By:` — the human who prompted, reviewed, and committed bears authorship responsibility - AI assistance is acknowledged only when substantive (root-cause analysis, non-obvious algorithm design) — never for mechanical changes - Reviewer `Co-Authored-By:` is **encouraged** to promote positive community interaction and recognize the real work of code review - External context (Discourse threads, issues, emails) that informed a PR gets prose attribution with stable links <details> <summary>Proposed format</summary> **Substantive AI contribution (structured trailer):** ``` Tool-Assisted: Claude Code (claude-opus-4-6) Role: root-cause analysis, hypothesis testing Contribution: identified CCACHE_NODIRECT=1 as sole cause of 0.02% hit rate by comparing ARM CI with Azure DevOps pipelines. ``` **Minor AI contribution (single-line trailer):** ``` Assisted-by: Claude Code — cherry-pick conflict resolution and API adaptation ``` **No mention needed:** mechanical refactoring, formatting, boilerplate, applying well-known patterns the human specified. **External context (prose, not trailers):** ``` Based on discussion in https://discourse.itk.org/t/7745 (patches for 5.4.6) and blowekamp's observation about orphaned test coverage. ``` **Transient links:** extract minimum context instead of linking. E.g., "Azure DevOps windows-2019 runner retired 2025-03-24" rather than a link to a CI build that will expire. </details> <details> <summary>Community context from Discourse #7728</summary> This proposal is informed by the discussion in [AI generated pull requests overwhelming, hard to review carefully](https://discourse.itk.org/t/7728): - **@Niels_Dekker** raised the concern about "cognitive debt" — nobody truly understanding AI-generated code. `Co-Authored-By: AI` makes this worse by implying shared responsibility that doesn't exist. ([Post #12](https://discourse.itk.org/t/7728/12)) - **@matt.mccormick** emphasized keeping interactions "social, relational, and human-to-human, with AI as a supporting tool." Reviewer `Co-Authored-By:` supports this. ([Post #9](https://discourse.itk.org/t/7728/9)) - **@dzenanz** noted that for large AI-generated changes "reviewing becomes the dominant part of the work" — making reviewer attribution even more important. ([Post #2](https://discourse.itk.org/t/7728/2)) - **@blowekamp**'s merged PR #5975 (DOC: Add Create a Pull Request section with draft PR guidance) establishes human/agent responsibilities for PRs. Attribution rules should complement that guidance. </details> <details> <summary>Decision points for discussion</summary> 1. Should model ID be included in `Tool-Assisted:` for reproducibility? 2. Threshold between structured (multi-line) vs short-form (single-line) AI mention? 3. Should this live in `Documentation/AI/git-commits.md` or `CONTRIBUTING.md`? 4. Interaction with the existing "no essay" commit message brevity rule — attribution must not balloon commit messages 5. Relationship to PR #5975's draft PR guidance — should attribution rules cross-reference it? </details> Motivated by [Discourse #7728](https://discourse.itk.org/t/7728) and practical experience with attribution in PRs #6044–#6057. Complements the draft PR guidelines in #5975.

The motivating insight came from practical experience with PRs #6044–#6057 over the past two days. I had been putting Co-Authored-By: Claude on commits, and realized this was wrong — it implies shared authorship responsibility that an AI tool fundamentally cannot bear. If there’s a bug, the AI doesn’t get paged. That’s the test.

The short version

Co-Authored-By: is for humans only. This directly addresses @Niels_Dekker’s cognitive debt concern: if we attribute authorship to an AI, we create the illusion that someone else understands and can maintain the code. Nobody does except the human who committed it.

Reviewer Co-Authored-By: is encouraged. As @dzenanz noted, for large AI-generated changes “reviewing becomes the dominant part of the work.” When a reviewer’s feedback materially shapes the code — like suggesting to keep v142 testing but migrate to windows-2022 — they deserve attribution. This supports what @matt.mccormick described as keeping our interactions “social, relational, and human-to-human.”

When AI assistance was substantive, use a different trailer:

For non-obvious contributions (root-cause analysis, algorithm design):

Tool-Assisted: Claude Code (claude-opus-4-6)
  Role: root-cause analysis
  Contribution: identified CCACHE_NODIRECT=1 as sole cause of 0.02% hit rate

For minor contributions:

Assisted-by: Claude Code — cherry-pick conflict resolution

Both are machine-parsable (git log --format='%(trailers:key=Tool-Assisted)') and human-readable. The intent is provenance for future maintainers, not credit.

No mention at all for mechanical changes (formatting, rename, boilerplate). These are tool use, not collaboration.

Relationship to PR #5975

@blowekamp’s merged PR guidelines establish when a PR is ready for human review. These attribution rules complement that by clarifying who gets credit in the commit history once the review is done. Together they answer: “who is responsible for preparing this code?” (the PR author) and “who shaped it into its final form?” (the author + reviewers, documented via Co-Authored-By:).

Feedback welcome on the issue or here.

Niels_Dekker · April 14, 2026, 12:34pm

Thanks very much for elaborating on this issue, @hjmjohnson

Sorry, I don’t really have a problem with “Co-Authored-By: Claude Sonnet 4.6” Do you really think that other reviewers might have the illusion that Claude Sonnet 4.6 is a human person?

Co-Authored-By is a very specific “keyword”, supported by GitHub: Creating a commit with multiple authors - GitHub Docs Are you proposing to introduce new ITK-specific commit message keywords? If so, it may be clearer to add an ITK prefix:

ITK-Tool-Assisted:

And:

ITK-Assisted-by:

Otherwise, we might as well just use more human style language, using spaces as separator, instead of dashes:

Used tool: Clause

Or:

With help from ...

What do you think?

if we attribute authorship to an AI, we create the illusion that someone else understands and can maintain the code. Nobody does except the human who committed it.

I hope the people who approve the commit and the one who does the merge also understand and can maintain the code

hjmjohnson · April 14, 2026, 1:55pm

@Niels_Dekker I have been generating prompts like “scavenge all the commit messages in pr’s and issues, and on discourse to…..” The structured format would help future assistance tools contextualize the content. when I had AI assist with making the rules i insisted that it be as terse as possible with primary goal of human understanding and secondary goal of computer parsing. My slight hesitency for using Co-Authored-By is that it has very specific github-centric meaning intended to link authors in github explicitly for attribution. I feel that is a little contrary to the “human centric” attribution we are trying to do.

I love the discussion. My primary goal is comming up with guidance principles that are implementable, not to optimize the one best solution. If we make a mistake now, we can change our minds later.

blowekamp · April 14, 2026, 3:16pm

I don’ think the tool used or other AI details should be in the commit message. They can optionally be in the PR request to guide reviewing.

A commit message should focus on what and why not how.

hjmjohnson · April 14, 2026, 5:20pm

I am fully on board with this stance. It greatly simplifies the implementation details.