Monitoring
WASP has multiple built-in monitoring systems that track health, cognitive load, execution traceability, and system integrity.
Health Monitor (src/health/monitor.py)
The HealthMonitor runs checks every 5 minutes via HealthCheckJob:
Checks Performed
| Check | Method | Threshold |
|---|---|---|
| Redis | PING command | Must respond |
| PostgreSQL | Simple SELECT | Must respond |
| Ollama | GET /api/tags | Must respond (if configured) |
| Disk | shutil.disk_usage() | Warning >80%, Critical >95% |
| RAM | psutil.virtual_memory() | Warning >80%, Critical >95% |
| CPU | psutil.cpu_percent() | Warning >85% |
Health Score
The health score (0–100) is calculated as:
- Start at 100
- –30 for critical issues (Redis down, Postgres down)
- –15 for warnings (high disk/memory)
- –5 for minor issues (Ollama down, high CPU)
Stored in Redis at health:last_check as JSON.
Self-Healer (src/health/repair.py)
When issues are detected, SelfHealer takes action:
| Issue | Action |
|---|---|
| Disk >95% | Delete old logs, temp files, clear browser cache |
| Ollama unreachable | Send restart command via broker |
| PostgreSQL reconnect | Exponential backoff retry (via tenacity) |
| Redis reconnect | Exponential backoff retry |
# View self-healer log
docker compose logs agent-core 2>&1 | grep "self_healer"
Health Dashboard (/health)
The /health page provides a real-time view with:
- Redis, PostgreSQL, Ollama connectivity indicators
- Disk/RAM/CPU gauges with 12-point sparklines
- Health score (0–100)
- Safety & Execution Control panel — 4 sub-panels:
- Last Soft Gate: last
self_improvedecision (ALLOW / WARN / BLOCK) with timestamp - CRIT-3 Daily Budget: critical-path self-edits today vs daily cap of 3
- Saccadic Vision: last browser content change detection (timestamp + hash)
- Learning Queue:
behavioral:pendingqueue depth / 50 cap
- Last Soft Gate: last
- Integrity Report tab: Self-Integrity Monitor last report
Learning Queue Warning Thresholds
| Depth | Color | Message |
|---|---|---|
| 0–19 | Green | "Corrections pending LLM analysis" |
| 20–39 | Yellow | "High — corrections pending analysis" |
| 40–50 | Red | "Near cap — LLM storm risk" |
A near-cap queue means the BehavioralLearnerJob is falling behind. Check container logs for errors:
docker compose logs agent-core 2>&1 | grep "behavioral_learner"
HealthState Adaptive Execution (v2.5)
src/runtime/health_state.py evaluates system health before each LLM call:
Light Mode Thresholds
| Metric | Threshold |
|---|---|
| CPU | > 80% |
| RAM | > 80% |
| LLM latency | > 500ms average |
When any threshold is exceeded, mode="light" is set and a structured hint is injected into the handler context:
[SYSTEM_CONSTRAINT: LIGHT_MODE]
The system is currently under load. Prefer text-only responses.
Avoid browser-heavy tasks unless explicitly required.
This is additive and fail-open — no existing logic is modified.
Cognitive Pressure Index (CPI)
The CPI is a composite metric (0–100) measuring overall agent cognitive load:
CPI Components
| Component | Weight | Source |
|---|---|---|
| Active goals count | 20% | GoalOrchestrator |
| Error rate (24h) | 25% | Audit log |
| Average skill latency | 20% | Audit log |
| Memory growth rate | 15% | MemoryManager |
| CPU usage | 20% | psutil |
CPI Thresholds
- CPI < 50: Normal operation
- CPI 50–80: Elevated load
- CPI > 80: High load —
agent:cpi_highflag set (TTL 10min)
When agent:cpi_high is set:
AutonomousGoalGeneratorJobskips its runDreamJobskips its runBackgroundPerceptionJobskips its run
# Current CPI
docker exec agent-redis redis-cli GET agent:cpi_current
# Check high-load flag
docker exec agent-redis redis-cli GET agent:cpi_high
Dashboard: visible as a banner in the Cognitive tab when CPI > 80.
Self-Integrity Monitor (src/scheduler/integrity.py)
SelfIntegrityMonitorJob runs every 6 hours:
Checks
- Strength accuracy — Are claimed strengths backed by actual skill success rates?
- Failure acknowledgment — Does the self-model reflect actual failure patterns?
- Epistemic drift — Has domain confidence diverged from actual performance?
- Audit error spikes — Are there unusual error patterns in the audit log?
Integrity Report
Results stored in Redis at agent:integrity_report:
{
"timestamp": "2026-04-15T12:00:00Z",
"score": 87,
"issues": [
"browser skill claimed as strength but 42% success rate (threshold: 60%)",
"epistemic.programming confidence 0.90 vs actual 0.85 (drift: 0.05)"
],
"recommendations": [
"Update self-model: browser skill should be in known_failures"
]
}
docker exec agent-redis redis-cli GET agent:integrity_report | python3 -m json.tool
Viewable in the /health page → Integrity tab.
SaccadicVision Change Detection (v2.5)
src/runtime/saccadic_vision.py runs as a background daemon thread:
- Polls
browser:last_contentRedis key every 2 seconds - Computes SHA-1 hash of content
- On change: emits
{"type": "saccadic_change_detected", curr_hash: "...", timestamp: "..."}toevents:saccadicstream (maxlen=500) - 2s debounce prevents event spam
- Fully fail-open — never blocks execution
# View recent saccadic events
docker exec agent-redis redis-cli XREVRANGE events:saccadic + - COUNT 5
Shell Audit Trail (v2.6)
Every shell skill invocation writes one AuditLog row:
action = "skill.shell"input_summary= redacted command (secrets stripped)output_summary=exit={code}+ optionalgoal={id}error= error message if failed
Query shell history:
docker exec agent-postgres psql -U agent -d agent -c "
SELECT input_summary as cmd, output_summary as result, error, timestamp
FROM audit_log
WHERE action = 'skill.shell'
ORDER BY timestamp DESC
LIMIT 20;
"
Introspector (src/health/introspection.py)
On-demand performance reports via Telegram command /introspect:
- Health score and component breakdown
- Skill usage statistics (last 24h)
- Top skills by call count
- Average response time
- Error rate and memory usage
Metrics (src/observability/metrics.py)
Operational metrics tracked in Redis:
docker exec agent-redis redis-cli HGETALL metrics:requests
docker exec agent-redis redis-cli HGETALL metrics:tokens
Dashboard /metrics page shows latency charts, token usage, and error rates.
Setting Up Alerts
# Script to alert if health score drops
SCORE=$(docker exec agent-redis redis-cli GET health:score)
if [ "$SCORE" -lt "70" ]; then
curl -X POST https://your-alerting-system/alert \
-d "{\"message\": \"WASP health score: $SCORE\"}"
fi