Skip to main content

Plan Critic

The Plan Critic is a second LLM pass that validates generated plans before they are executed. It catches logical errors, impossible tasks, and unsafe sequences — preventing wasted execution cycles.

Architecture

User objective


PlanGenerator (4,000 token budget)

▼ TaskGraph (draft)


PlanCritic.validate() (1,200 token budget)

├── PASS → Execute plan
└── FAIL → Regenerate with critique feedback

What the Critic Checks

The PlanCritic (src/goal_orchestrator/plan_validator.py) evaluates:

  1. Completeness — Does the plan actually achieve the stated objective?
  2. Feasibility — Are all tasks achievable with available skills?
  3. Dependency correctness — Do task dependencies make logical sense?
  4. Missing error handling — Are there obvious failure modes not addressed?
  5. Skill availability — Are referenced skills registered and enabled?
  6. Circular dependencies — No loops in the task DAG
  7. Resource safety — Does the plan respect budget constraints?

Validation Output

The critic returns a structured result:

ValidationResult(
passed=True, # False if plan should be rejected
issues=[ # List of identified problems
"Task 3 depends on Task 5 which hasn't been defined",
"Step to 'send email' requires gmail skill which may not be configured",
],
suggestions=[ # Improvements (even when passed=True)
"Consider adding error handling for network failures",
],
confidence=0.87, # Critic's confidence in the plan
)

Replan Loop

When the critic rejects a plan:

PlanCritic: FAIL
Issues: ["No skill available to check exchange rate"]


PlanGenerator (retry with critique context):
"Previous plan rejected: No skill for exchange rate.
Use web_search to fetch rate instead."


New TaskGraph (revised)


PlanCritic: PASS → Execute

Maximum replans: GOAL_BUDGET_MAX_REPLANS (default: 5).

Configuration

SettingDefaultDescription
PLAN_CRITIC_ENABLEDtrueEnable/disable the plan critic
PLAN_CRITIC_MAX_TOKENS1,200Token budget for critic review

Performance Impact

The critic adds one LLM call per plan generation:

  • Typical critic call: 300-800ms
  • 1,200 token budget (fast, cheap models work well)
  • Only runs on GoalEngine plans (not on regular chat skill calls)

Enabling/Disabling

# Disable critic (faster planning, less validation)
PLAN_CRITIC_ENABLED=false docker compose up -d agent-core

Or at runtime by modifying goal_orchestrator.plan_critic = None.

Late Wiring

The plan critic is late-wired after the goal orchestrator is initialized:

# src/main.py
if settings.plan_critic_enabled:
goal_orchestrator.plan_critic = PlanCritic(
model_manager=model_manager,
skill_registry=skill_registry,
max_tokens=settings.plan_critic_max_tokens,
)

This allows the critic to be toggled without restarting the full stack.

Metrics

When a plan is rejected, the audit log records:

plan_critic.rejected  goal_id=... issues=2  generation=1

Track plan quality over time:

docker exec agent-postgres psql -U agent -d agent -c "
SELECT COUNT(*) FROM audit_log
WHERE input_summary LIKE '%plan_critic.rejected%'
AND created_at > NOW() - INTERVAL '7 days';
"