Plan Critic
The Plan Critic is a second LLM pass that validates generated plans before they are executed. It catches logical errors, impossible tasks, and unsafe sequences — preventing wasted execution cycles.
Architecture
User objective
│
▼
PlanGenerator (4,000 token budget)
│
▼ TaskGraph (draft)
│
▼
PlanCritic.validate() (1,200 token budget)
│
├── PASS → Execute plan
└── FAIL → Regenerate with critique feedback
What the Critic Checks
The PlanCritic (src/goal_orchestrator/plan_validator.py) evaluates:
- Completeness — Does the plan actually achieve the stated objective?
- Feasibility — Are all tasks achievable with available skills?
- Dependency correctness — Do task dependencies make logical sense?
- Missing error handling — Are there obvious failure modes not addressed?
- Skill availability — Are referenced skills registered and enabled?
- Circular dependencies — No loops in the task DAG
- Resource safety — Does the plan respect budget constraints?
Validation Output
The critic returns a structured result:
ValidationResult(
passed=True, # False if plan should be rejected
issues=[ # List of identified problems
"Task 3 depends on Task 5 which hasn't been defined",
"Step to 'send email' requires gmail skill which may not be configured",
],
suggestions=[ # Improvements (even when passed=True)
"Consider adding error handling for network failures",
],
confidence=0.87, # Critic's confidence in the plan
)
Replan Loop
When the critic rejects a plan:
PlanCritic: FAIL
Issues: ["No skill available to check exchange rate"]
│
▼
PlanGenerator (retry with critique context):
"Previous plan rejected: No skill for exchange rate.
Use web_search to fetch rate instead."
│
▼
New TaskGraph (revised)
│
▼
PlanCritic: PASS → Execute
Maximum replans: GOAL_BUDGET_MAX_REPLANS (default: 5).
Configuration
| Setting | Default | Description |
|---|---|---|
PLAN_CRITIC_ENABLED | true | Enable/disable the plan critic |
PLAN_CRITIC_MAX_TOKENS | 1,200 | Token budget for critic review |
Performance Impact
The critic adds one LLM call per plan generation:
- Typical critic call: 300-800ms
- 1,200 token budget (fast, cheap models work well)
- Only runs on GoalEngine plans (not on regular chat skill calls)
Enabling/Disabling
# Disable critic (faster planning, less validation)
PLAN_CRITIC_ENABLED=false docker compose up -d agent-core
Or at runtime by modifying goal_orchestrator.plan_critic = None.
Late Wiring
The plan critic is late-wired after the goal orchestrator is initialized:
# src/main.py
if settings.plan_critic_enabled:
goal_orchestrator.plan_critic = PlanCritic(
model_manager=model_manager,
skill_registry=skill_registry,
max_tokens=settings.plan_critic_max_tokens,
)
This allows the critic to be toggled without restarting the full stack.
Metrics
When a plan is rejected, the audit log records:
plan_critic.rejected goal_id=... issues=2 generation=1
Track plan quality over time:
docker exec agent-postgres psql -U agent -d agent -c "
SELECT COUNT(*) FROM audit_log
WHERE input_summary LIKE '%plan_critic.rejected%'
AND created_at > NOW() - INTERVAL '7 days';
"