Skip to content

PR Agent

Jakub Florkowski edited this page May 22, 2026 · 12 revisions

The PR Agent validates fixes through automated testing, explores alternatives using multiple AI models, and synthesizes everything into actionable recommendations.

Design Philosophy

The Core Question

"How would AI solve this problem with ZERO human influence?"

That is the fundamental question behind try-fix and this entire workflow. Each try-fix attempt is the AI solving an issue from scratch — no distractions, no anchoring to an existing PR's approach. We found that when AI is shown a human fix and asked to "come up with alternatives," it gets lazy and just validates the PR ("looks fine to me"). Generating fixes in a bubble prevents this and produces genuinely independent approaches.

PR Review = Issue Fixing

A PR review and an issue fix are conceptually the same workflow. The only difference is that with a PR, a human attempt already exists — without one, it doesn't. In both cases the AI's job is identical: understand the bug, generate independent fixes, compare all candidates (including the human attempt if one exists), and select the best one.

Two Distinct Roles: Generation vs. Hardening

The workflow deliberately separates two AI capabilities:

  1. Smart generation (Try-Fix) — AI uses reference material, code-review findings, and codebase knowledge to generate the best fix it can. The goal is smarter context for better generation, not expert review of the output. Each model works in isolation so we can measure AI fix quality independently.

  2. Expert hardening (Report) — After the best fix is selected, the expert reviewer evaluates it for hardening needs: Are the tests thorough? Is the code robust? Are edge cases covered? This is a separate, later step that identifies what the winning fix might still be missing — not a gate on try-fix itself.

The Feedback Loop

If expert hardening reveals improvements that the winning fix missed, those insights feed back into improving the try-fix process: Why didn't the AI generate this better approach? What context or knowledge was missing? This creates a virtuous cycle where try-fix gets smarter over time, progressively reducing the need for human-authored fixes.

Strategic Direction

The long-term goal is that AI-generated fixes become the norm. By having AI solve problems independently and measuring its quality in isolation, we can systematically identify gaps, improve the generation process, and build toward a future where human fixes are the exception rather than the rule.


%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1e1e2e', 'primaryTextColor': '#cdd6f4', 'primaryBorderColor': '#45475a', 'lineColor': '#6c7086', 'secondaryColor': '#313244', 'tertiaryColor': '#181825'}}}%%
flowchart LR
    subgraph gate ["🧪 GATE"]
        direction TB
        G1[Detect tests in PR]
        G2[Verify tests fail without fix]
        G3[Verify tests pass with fix]
        G1 --> G2 --> G3
    end
    
    subgraph review ["🤖 PR REVIEW"]
        direction TB
        R1[Pre-Flight: gather context]
        R2[Try-Fix: 4 models sequentially]
        R3[Report: write recommendation]
        R1 --> R2 --> R3
    end
    
    subgraph post ["📊 POST"]
        direction TB
        P1[Post AI summary comment]
        P2[Apply agent labels]
        P1 --> P2
    end
    
    gate --> review --> post
    
    style gate fill:#1e1e2e,stroke:#89b4fa,stroke-width:2px,color:#cdd6f4
    style review fill:#1e1e2e,stroke:#cba6f7,stroke-width:2px,color:#cdd6f4
    style post fill:#1e1e2e,stroke:#a6e3a1,stroke-width:2px,color:#cdd6f4
Loading

Every fix is tested. The agent doesn't theorize—it implements each approach, runs tests, and reports what works.


Quick Start

Recommended: Use Plan Mode First

For the best results, start in plan mode to create and review a detailed plan before execution:

  1. Enter plan mode: Press Shift+Tab or use /plan
  2. Request a review plan:
    /plan review PR #12345 - create a detailed plan for the review
    
  3. Review the plan: Copilot will create a structured plan. Review the steps and make adjustments.
  4. Exit plan mode: Press Shift+Tab to switch back to execution mode
  5. Execute the plan:
    proceed with the plan
    

Direct Invocation (Alternative)

copilot

# Ask it to review a PR
please review PR #12345

Trigger Phrases

Phrase Description
"Review PR #XXXXX" Review an existing PR with independent analysis
"Work on PR #XXXXX" Investigate and implement a fix
"Fix issue #XXXXX" Works whether or not a PR exists

How It Works

The pipeline is orchestrated by Review-PR.ps1:

.\Review-PR.ps1 -PRNumber 33687
.\Review-PR.ps1 -PRNumber 33687 -Platform ios

Step 0: Branch Setup

Creates a review branch from main and squash-merges the PR onto it. If there are merge conflicts, posts a comment on the PR and exits.

Step 1: Gate

Runs verify-tests-fail.ps1 directly (no Copilot agent — pure script):

  1. Detects tests in the PR diff via Detect-TestsInDiff.ps1
  2. Verifies tests fail without the fix (baseline)
  3. Verifies tests pass with the fix applied

Results:

  • PASSED — tests catch the bug ✅
  • SKIPPED — no tests detected in PR (recommends @copilot write tests for this PR)
  • FAILED — tests didn't behave as expected ❌

The gate result is posted as a PR comment and passed as context to Step 2.

Step 2: PR Review

Invokes Copilot CLI with the prompt "Use a skill to review PR #XXXXX", which triggers the pr-review skill. This runs three phases:

Pre-Flight — Reads the linked issue, PR description, and comments. Classifies changed files. No code analysis — just context gathering. Output: pre-flight/content.md

Try-Fix (mandatory) — Four models explore independent fix ideas sequentially, each working in a bubble with zero influence from the PR's approach:

Order Model
1 Claude Opus 4.6
2 Claude Opus 4.7
3 GPT-5.3-Codex
4 GPT-5.5

Each model generates an independent fix — the question is always "how would you solve this from scratch?" — implements it, and runs tests. Models receive code-review hints and reference material for smarter context, and only review the PR's fix to ensure their approach is genuinely different — not to anchor on it. Between attempts, the baseline is restored via EstablishBrokenBaseline.ps1 -Restore.

After all 4 attempts, cross-pollination rounds let each model see all attempt summaries and propose any new ideas. Repeats until all say "NO NEW IDEAS" (max 3 rounds).

The best passing fix is selected by comparing simplicity, robustness, and codebase consistency. Output: try-fix/content.md

Report (Expert Hardening) — After the best fix is selected, the expert reviewer evaluates it for hardening needs: are tests thorough, is the code robust, are edge cases covered? If improvements are identified, they're documented as feedback to improve future try-fix generation. Writes the final recommendation (APPROVE or REQUEST CHANGES). Output: report/content.md

Step 3: Post AI Summary

Runs post-ai-summary-comment.ps1 to post the review as a PR comment combining gate result, try-fix comparison, and recommendation.

Step 4: Apply Labels

Runs Update-AgentLabels.ps1 to parse the phase output files and apply labels:

Label Meaning
s/agent-reviewed PR was reviewed (always applied)
s/agent-approved Agent recommends approval
s/agent-changes-requested Agent recommends changes
s/agent-review-incomplete Agent couldn't complete all phases
s/agent-gate-passed Tests catch the bug
s/agent-gate-failed Could not verify tests catch the bug
s/agent-fix-win Agent found a better fix
s/agent-fix-pr-picked PR's fix was best
s/agent-fix-implemented (Manual) Author adopted agent's suggestion

Output Structure

All phase output is written to CustomAgentLogsTmp/PRState/{PRNumber}/PRAgent/:

gate/content.md           ← Gate result
pre-flight/content.md     ← PR context and file classification
try-fix/content.md        ← Fix comparison table
  attempt-{N}/            ← Per-model attempt details
report/content.md         ← Final recommendation

When NOT to Use

Task Use Instead
Just run tests manually Sandbox Agent
Only write tests Write Tests Agent
Extract lessons from a completed PR Learn From PR Agent

Related

Clone this wiki locally