Plan Critic

The Plan Critic is a second LLM pass that validates generated plans before they are executed. It catches logical errors, impossible tasks, and unsafe sequences — preventing wasted execution cycles.

Architecture

User objective
     │
     ▼
PlanGenerator (4,000 token budget)
     │
     ▼ TaskGraph (draft)
     │
     ▼
PlanCritic.validate() (1,200 token budget)
     │
     ├── PASS → Execute plan
     └── FAIL → Regenerate with critique feedback

What the Critic Checks

The PlanCritic (src/goal_orchestrator/plan_validator.py) evaluates:

Completeness — Does the plan actually achieve the stated objective?
Feasibility — Are all tasks achievable with available skills?
Dependency correctness — Do task dependencies make logical sense?
Missing error handling — Are there obvious failure modes not addressed?
Skill availability — Are referenced skills registered and enabled?
Circular dependencies — No loops in the task DAG
Resource safety — Does the plan respect budget constraints?

Validation Output

The critic returns a structured result:

ValidationResult(
    passed=True,         # False if plan should be rejected
    issues=[             # List of identified problems
        "Task 3 depends on Task 5 which hasn't been defined",
        "Step to 'send email' requires gmail skill which may not be configured",
    ],
    suggestions=[        # Improvements (even when passed=True)
        "Consider adding error handling for network failures",
    ],
    confidence=0.87,     # Critic's confidence in the plan
)

Replan Loop

When the critic rejects a plan:

PlanCritic: FAIL
  Issues: ["No skill available to check exchange rate"]
     │
     ▼
PlanGenerator (retry with critique context):
  "Previous plan rejected: No skill for exchange rate.
   Use web_search to fetch rate instead."
     │
     ▼
New TaskGraph (revised)
     │
     ▼
PlanCritic: PASS → Execute

Maximum replans: GOAL_BUDGET_MAX_REPLANS (default: 3).

Configuration

Setting	Default	Description
`PLAN_CRITIC_ENABLED`	`true`	Enable/disable the plan critic
`PLAN_CRITIC_MAX_TOKENS`	1,200	Token budget for critic review

Performance Impact

The critic adds one LLM call per plan generation:

Typical critic call: 300-800ms
1,200 token budget (fast, cheap models work well)
Only runs on GoalEngine plans (not on regular chat skill calls)

Enabling/Disabling

# Disable critic (faster planning, less validation)
PLAN_CRITIC_ENABLED=false docker compose up -d agent-core

Or at runtime by modifying goal_orchestrator.plan_critic = None.

Late Wiring

The plan critic is late-wired after the goal orchestrator is initialized:

# src/main.py
if settings.plan_critic_enabled:
    goal_orchestrator.plan_critic = PlanCritic(
        model_manager=model_manager,
        skill_registry=skill_registry,
        max_tokens=settings.plan_critic_max_tokens,
    )

This allows the critic to be toggled without restarting the full stack.

Metrics

When a plan is rejected, the audit log records:

plan_critic.rejected  goal_id=... issues=2  generation=1

Track plan quality over time:

docker exec agent-postgres psql -U agent -d agent -c "
  SELECT COUNT(*) FROM audit_log
  WHERE input_summary LIKE '%plan_critic.rejected%'
  AND created_at > NOW() - INTERVAL '7 days';
"

Architecture​

What the Critic Checks​

Validation Output​

Replan Loop​

Configuration​

Performance Impact​

Enabling/Disabling​

Late Wiring​

Metrics​