Skip to main content

Monitoring

WASP has multiple built-in monitoring systems that track health, cognitive load, execution traceability, and system integrity.

Health Monitor (src/health/monitor.py)

The HealthMonitor runs checks every 5 minutes via HealthCheckJob:

Checks Performed

CheckMethodThreshold
RedisPING commandMust respond
PostgreSQLSimple SELECTMust respond
OllamaGET /api/tagsMust respond (if configured)
Diskshutil.disk_usage()Warning >80%, Critical >95%
RAMpsutil.virtual_memory()Warning >80%, Critical >95%
CPUpsutil.cpu_percent()Warning >85%

Health Score

The health score (0–100) is calculated as:

  • Start at 100
  • –30 for critical issues (Redis down, Postgres down)
  • –15 for warnings (high disk/memory)
  • –5 for minor issues (Ollama down, high CPU)

Stored in Redis at health:last_check as JSON.

Self-Healer (src/health/repair.py)

When issues are detected, SelfHealer takes action:

IssueAction
Disk >95%Delete old logs, temp files, clear browser cache
Ollama unreachableSend restart command via broker
PostgreSQL reconnectExponential backoff retry (via tenacity)
Redis reconnectExponential backoff retry
# View self-healer log
docker compose logs agent-core 2>&1 | grep "self_healer"

Health Dashboard (/health)

The /health page provides a real-time view with:

  • Redis, PostgreSQL, Ollama connectivity indicators
  • Disk/RAM/CPU gauges with 12-point sparklines
  • Health score (0–100)
  • Safety & Execution Control panel — 4 sub-panels:
    • Last Soft Gate: last self_improve decision (ALLOW / WARN / BLOCK) with timestamp
    • CRIT-3 Daily Budget: critical-path self-edits today vs daily cap of 3
    • Saccadic Vision: last browser content change detection (timestamp + hash)
    • Learning Queue: behavioral:pending queue depth / 50 cap
  • Integrity Report tab: Self-Integrity Monitor last report

Learning Queue Warning Thresholds

DepthColorMessage
0–19Green"Corrections pending LLM analysis"
20–39Yellow"High — corrections pending analysis"
40–50Red"Near cap — LLM storm risk"

A near-cap queue means the BehavioralLearnerJob is falling behind. Check container logs for errors:

docker compose logs agent-core 2>&1 | grep "behavioral_learner"

HealthState Adaptive Execution (v2.5)

src/runtime/health_state.py evaluates system health before each LLM call:

Light Mode Thresholds

MetricThreshold
CPU> 80%
RAM> 80%
LLM latency> 500ms average

When any threshold is exceeded, mode="light" is set and a structured hint is injected into the handler context:

[SYSTEM_CONSTRAINT: LIGHT_MODE]
The system is currently under load. Prefer text-only responses.
Avoid browser-heavy tasks unless explicitly required.

This is additive and fail-open — no existing logic is modified.

Cognitive Pressure Index (CPI)

The CPI is a composite metric (0–100) measuring overall agent cognitive load:

CPI Components

ComponentWeightSource
Active goals count20%GoalOrchestrator
Error rate (24h)25%Audit log
Average skill latency20%Audit log
Memory growth rate15%MemoryManager
CPU usage20%psutil

CPI Thresholds

  • CPI < 50: Normal operation
  • CPI 50–80: Elevated load
  • CPI > 80: High load — agent:cpi_high flag set (TTL 10min)

When agent:cpi_high is set:

  • AutonomousGoalGeneratorJob skips its run
  • DreamJob skips its run
  • BackgroundPerceptionJob skips its run
# Current CPI
docker exec agent-redis redis-cli GET agent:cpi_current
# Check high-load flag
docker exec agent-redis redis-cli GET agent:cpi_high

Dashboard: visible as a banner in the Cognitive tab when CPI > 80.

Self-Integrity Monitor (src/scheduler/integrity.py)

SelfIntegrityMonitorJob runs every 6 hours:

Checks

  1. Strength accuracy — Are claimed strengths backed by actual skill success rates?
  2. Failure acknowledgment — Does the self-model reflect actual failure patterns?
  3. Epistemic drift — Has domain confidence diverged from actual performance?
  4. Audit error spikes — Are there unusual error patterns in the audit log?

Integrity Report

Results stored in Redis at agent:integrity_report:

{
"timestamp": "2026-04-15T12:00:00Z",
"score": 87,
"issues": [
"browser skill claimed as strength but 42% success rate (threshold: 60%)",
"epistemic.programming confidence 0.90 vs actual 0.85 (drift: 0.05)"
],
"recommendations": [
"Update self-model: browser skill should be in known_failures"
]
}
docker exec agent-redis redis-cli GET agent:integrity_report | python3 -m json.tool

Viewable in the /health page → Integrity tab.

SaccadicVision Change Detection (v2.5)

src/runtime/saccadic_vision.py runs as a background daemon thread:

  • Polls browser:last_content Redis key every 2 seconds
  • Computes SHA-1 hash of content
  • On change: emits {"type": "saccadic_change_detected", curr_hash: "...", timestamp: "..."} to events:saccadic stream (maxlen=500)
  • 2s debounce prevents event spam
  • Fully fail-open — never blocks execution
# View recent saccadic events
docker exec agent-redis redis-cli XREVRANGE events:saccadic + - COUNT 5

Shell Audit Trail (v2.6)

Every shell skill invocation writes one AuditLog row:

  • action = "skill.shell"
  • input_summary = redacted command (secrets stripped)
  • output_summary = exit={code} + optional goal={id}
  • error = error message if failed

Query shell history:

docker exec agent-postgres psql -U agent -d agent -c "
SELECT input_summary as cmd, output_summary as result, error, timestamp
FROM audit_log
WHERE action = 'skill.shell'
ORDER BY timestamp DESC
LIMIT 20;
"

Introspector (src/health/introspection.py)

On-demand performance reports via Telegram command /introspect:

  • Health score and component breakdown
  • Skill usage statistics (last 24h)
  • Top skills by call count
  • Average response time
  • Error rate and memory usage

Metrics (src/observability/metrics.py)

Operational metrics tracked in Redis:

docker exec agent-redis redis-cli HGETALL metrics:requests
docker exec agent-redis redis-cli HGETALL metrics:tokens

Dashboard /metrics page shows latency charts, token usage, and error rates.

Setting Up Alerts

# Script to alert if health score drops
SCORE=$(docker exec agent-redis redis-cli GET health:score)
if [ "$SCORE" -lt "70" ]; then
curl -X POST https://your-alerting-system/alert \
-d "{\"message\": \"WASP health score: $SCORE\"}"
fi