Skip to main content

Changelog

All notable changes to WASP are documented here. Versions follow a semantic versioning scheme after the initial phase-based development (Phases 1–18).


v2.2 — March 24, 2026

Focus: deep_scraper hardening, security fixes, capability map completeness

New

  • deep_scraper promoted from custom OpenClaw skill → permanent built-in skill (src/skills/builtin/deep_scraper.py)
  • deep_scraper SSRF protection: _is_safe_url() resolves all A/AAAA records via getaddrinfo(), blocks loopback/private/link-local/reserved IPs, fails closed on DNS failure; runs via asyncio.to_thread() (non-blocking)

Fixed

  • auto_detect.py: YouTube URL detection was routing to shell skill with raw docker command (security bypass) — now correctly routes to deep_scraper(url=...) with full capability enforcement
  • skills/builtin/__init__.py: delete_reminder and meta_orchestrate added to _CAPABILITY_MAP (were relying on default fallback — now explicitly declared)
  • response_validator.py: deep_scraper added to _PRICE_GROUNDING_SKILLS (consistent with browser_deep_scrape)

Cleanup

  • /data/skills/deep-scraper/ custom skill directory removed — eliminates phantom custom skill entry in the Skills dashboard page

v2.1 — March 23, 2026

Focus: production audit, browser CPU fix, security hardening, multilingual support

New

  • Browser Session Idle Reaper: daemon thread closes Chromium sessions idle >300s — fixes chronic 80%→0.25% CPU exhaustion from stale sessions
  • Browser URL blocklist: blocks file://, javascript:, data:, vbscript: schemes and RFC-1918/loopback/cloud metadata IMDS addresses
  • Multilingual Auto-Detect: lang_detect.py — browser/screenshot/navigation patterns in EN, ES, PT, FR, DE, ZH, JA, KO, AR, RU; localized fallback responses in 10 languages
  • Domain Drift Protection: validator catches browser→crypto/email substitution attempts; should_retry=False on confirmed substitution; Capability Engine skips when auto-detect already handled the request

Fixed (6 bugs from production audit)

  • autonomous.py: autonomy_mode was set to "auto" (invalid enum) — goal creation was completely broken; fixed to autonomy_mode=None
  • handlers.py: recovery round used wrong generate() signature — recovery never executed; fixed to ModelRequest(...)
  • handlers.py: _can_recover was overriding validator's should_retry=False via or reason=="drift" — now respects validator decision
  • handlers.py: screenshot path collection used search (first match only); fixed to finditer (all paths); browser_screenshot_full_page added to filter
  • handlers.py: Capability Engine was running even when auto-detect already handled the request — potential double-execution; now gated by not auto_calls
  • behavioral_learner.py: Telegram notifications were published to "agent:outgoing" (dead stream) — silently lost; fixed to "events:outgoing"

Security

  • self_improve.py: _list_files() path containment now uses realpath() (matches existing check in _read_file()) — prevents symlink traversal
  • redaction.py: AIza pattern broadened to {25,} (was {35}); AKIA pattern to {12,} (was {16})
  • capability_engine.py: blocks raw skill output in email body (Screenshot saved to, /data/screenshots/ paths, ⚠️ Verify the title)

v2.0

Focus: Active Flow Context Lock, Planning Mode override, Response Contract, Intent Completeness

New

  • Active Flow Context Lock: per-chat Redis state (TTL 15 min) survives LLM failures; follow-up messages anchored to the same domain; [ACTIVE FLOW — CONTEXT LOCK] block injected into system prompt — eliminates cross-domain hallucination (e.g., crypto question answered with weather data)
  • Planning Mode Hard Override: 5-layer execution block (auto-detect → Decision Layer → Capability Engine → LLM loop → Validation safeguard); when user says "no ejecutes / solo analiza / antes de ejecutar", zero skills run regardless of LLM output
  • Universal Response Contract: _detect_response_type() classifies each request (comparison / multipart / list / explanation / action / chat); type-specific structure rule injected into every system prompt via _build_cognitive_control_block()
  • Intent Completeness Engine: intent_engine.py — deterministic multi-part intent extraction (4 strategies: colon list, numbered list, "y también" chain, multi-question); one completeness-retry per turn with exact missing-section correction prompt
  • flow_state.py (new): save_active_flow(), load_active_flow(), clear_active_flow(), is_explicit_domain_switch(), is_crypto_recovery_followup(), detect_flow_assets()

Improved

  • ResponseValidator.validate() now accepts planning_mode=True — new _check_planning_mode_violation() fires first when active
  • ResponseValidator._check_completeness_multipart() — blocks structurally incomplete multi-part answers (≥2 ?, enumeration starts, "también/además")
  • render_report.py (crypto): premium terminal-grade format with aligned columns, volume in B/M notation, price arrows inline, separate email/Telegram renderers

Tests

  • 34 new tests: tests/test_flow_state.py (all passing)

v1.9

Focus: Response Validation, voice input, audio pipeline, production fixes

New

  • Response Validation & Recovery Engine: deterministic post-LLM validator — grounding_fail / incomplete / drift / screenshot_incomplete checks; 2-retry auto-recovery; no LLM calls in validation path
  • RecoveryMemory: Redis FIFO store (50 entries, 7-day TTL) — only validated successful recoveries stored; no noise from failed attempts
  • Voice/Audio Input: Telegram voice messages fully operational — handle_voice() in bridge downloads to /data/shared/uploads/voice_{uuid}.ogg; transcribe_audio_sync() calls OpenAI Whisper API via asyncio.to_thread() with 12s hard timeout; transcription fed into full LLM+skill pipeline
  • extract_fields.py skill: extract named fields from previous skill output by path (e.g., field_name:var_name)
  • Telegram typing indicator with 95s response timeout guard — _pending dict + _response_timeout_guard() — prevents stuck typing indicator on long responses

Fixed

  • Critical metadata decode bug: bus.py auto-decodes all Redis stream fields as JSON — handlers now accept both dict and str for metadata
  • _skill_round_count UnboundLocalError (was used before assignment in some code paths)
  • check_screenshot_completeness() now trace-based only (execution skills set) — no brittle response string matching

Scale

  • Scheduler: 25 → 27 background jobs
  • Skills: 26 → 27 built-in skills (extract_fields.py added)
  • DB: 18 → 21 tables (AgentRecord, AgentMessage, BehavioralRule added)

v1.8

Focus: Capability Engine production hardening, quality scoring, degradation detection

New

  • Capability Engine v2: strict template validation — _HARD_ARGS abort, _OPTIONAL_ARGS → empty string, all others required
  • Weighted scoring: (kw_hits×2) + (success_rate×5) + (avg_completeness×3) + recency_bonus - latency_penalty
  • Pre-execution static validation: _pre_validate() checks all template vars before any step runs
  • Output completeness guarantee: blocks incomplete renders; validates email body ≥50 chars
  • Improvement loop: completeness_history (last 10 runs), EMA latency tracking, auto-degradation detection
  • AgentManagerSkill: LLM can create/list/pause/resume/archive agents via natural language (agent_manager skill); late-wired to AgentOrchestrator in main.py

Improved

  • Goal Priority Axis: Goal.priority (1-10) + Goal.source fields; user goals=8, agent goals=6, autonomous=3; tick() sorts by priority descending
  • Self-Integrity Monitor: SelfIntegrityMonitorJob every 6h — cross-checks self-model strengths vs actual skill success rates, detects epistemic drift, checks audit_log error spikes

v1.7

Focus: Memory & Resource Governance, Opportunity Engine, Self-Reflection Engine

New

  • Opportunity Engine (scheduler/opportunity_engine.py): scans episodic memory for automation patterns (crypto, news, website, reports, API); max 2 suggestions/day, 48h dedup
  • Self-Reflection Engine: LLM post-mortem insights after goal completion/failure; max 3/goal; Redis TTL 7d; injected into future context
  • Resource Governor: Redis-backed rate limiting — goal slots (10), LLM/min (30), API/min (60), tasks/hour (50)
  • Goal-scoped Memory (goal_memory table): episodic memory scoped to a specific goal's execution context — prevents cross-goal pollution
  • Memory Ranking System: score = 0.5×similarity + 0.3×recency + 0.2×importance; applied before context injection
  • build_context() accepts goal_id parameter; injects goal-scoped memory + reflection insights when provided

Scale

  • Scheduler: 23 → 25 background jobs (opportunity_engine added; vector_index already counted)
  • Memory: 9 → 11 primary layers (goal_memory + self-model/epistemic)
  • DB: new goal_memory table (auto-created via SQLAlchemy create_all)

Observability

  • New structured logs: memory_retrieved, memory_ranked, goal_memory_added, goal_memory_used, opportunity_detected, opportunity_suggested, reflection_triggered, reflection_saved

v1.6

Focus: Decision Layer, Production Hardening v2, Goal Engine improvements

New

  • Decision Layer (src/decision_layer.py): pure heuristic pre-LLM classifier with 5 strategies — DIRECT_RESPONSE / GOAL / SCHEDULED_TASK / SUB_AGENT / SCRIPT; SCHEDULED_TASK and SUB_AGENT call skills directly without LLM (zero hallucination risk); GOAL routes directly to GoalOrchestrator
  • Behavioral Learning Loop: _detect_correction() in handlers.py; correction queued to Redis behavioral:pending; BehavioralLearnerJob (every 120s) → LLM analysis → rule saved to behavioral_rules DB table → injected in every system prompt; rule types: refusal / hallucination / wrong_skill / missing_context
  • Cognitive Pressure Index (CPI): 0–100 composite metric (active goals 20%, error rate 25%, latency 20%, memory growth 15%, CPU 20%); background jobs skip when >80
  • Self-Integrity Monitor: SelfIntegrityMonitorJob every 6h; cross-validates self-model against actual performance; JSON report at agent:integrity_report
  • Circuit Breaker Redis persistence: circuit breaker state saved to cb:state:{integration_id} on every transition; TTL = max(86400, recovery_timeout×10)
  • Sovereign Mode: SOVEREIGN_MODE=true (default); raises MAX_SKILL_ROUNDS to 12; injects ⚡ SOVEREIGN MODE ACTIVE block; doubles cognitive budgets
  • delete_reminder skill: deletes by keyword match or keyword="all"
  • Self-repair: patch(file, old_text, new_text) surgical edits + install(package) runtime pip installs; all patches auto-persisted to /data/src_patches/ and re-applied on rebuild

Goal Engine

  • Plan Lock: goal.plan_locked = True after first task succeeds — blocks spurious replanning while the plan is working
  • 8-step cap: plans exceeding 8 steps automatically truncated in topological order
  • Duplicate Goal Detection: Jaccard word overlap ≥60% → return existing goal instead of creating duplicate
  • Structured observability events: plan_created / plan_locked / plan_replan / plan_completed
  • Replan storm: threshold 3 replans / 5 min (was 6 / 10 min); now marks goal FAILED with partial outputs collected (was PAUSED silently)
  • Planner step preference: deterministic tool first → existing skill → LLM as last resort

Fixed

  • Duplicate task execution (removed immediate execution override — tasks now scheduled at now + interval)
  • Month-boundary date parsing bug (parsed.replace(day=parsed.day+1)parsed + timedelta(days=1))
  • PAUSED goals blocking agent forever — runtime.tick() auto-resumes after backoff, fails after 10min paused
  • _clean_telegram_output(): strips markdown, prompt leakage ([TAREA PROGRAMADA:], EJECUTA AHORA), execution summaries from all outgoing messages
  • Auto-detect "nuevo agente" false positive — _AGENT_CREATE_VETO_PATTERNS blocks complaint text from triggering agent creation

Scale

  • Scheduler: 22 → 23 background jobs (world_model added)

v1.5

Focus: Next-Gen Cognitive Systems, Vector Memory, Security Hardening

New Cognitive Systems (6)

  • Vector Semantic Memory: PostgreSQL memory_embeddings table; Ollama nomic-embed-text embeddings or deterministic SHA-512 fallback; cosine similarity search (top-K); injected as [MEMORIA SEMÁNTICA RELEVANTE]; feature-flagged VECTOR_MEMORY_ENABLED
  • Plan Critic: LLM validates TaskGraph before execution; enabled via PLAN_CRITIC_ENABLED
  • Meta-Agent Supervisor: meta_orchestrate skill decomposes goal into coordinated agent team; META_AGENT_ENABLED
  • World Model: EntityState table tracks real-world entity states (BTC price, trend, change %); WorldModelJob every 15min; entity cards on dashboard
  • Skill Evolution Engine: skill_patterns table; detects recurring multi-skill sequences (min 5 occurrences); LLM synthesizes composite Python skill; AST validation before write; SKILL_EVOLUTION_ENABLED
  • Temporal Reasoner: trend summaries injected as [TEMPORAL INSIGHTS]; TEMPORAL_REASONING_ENABLED

Security Hardening (7 fixes)

  • self_improve: all operations (read, write, patch) now use realpath() — closes symlink-based path traversal
  • LLM-generated skill code: AST validation before write (blocks subprocess, os.system, eval, exec, etc.)
  • CSRF: token now session-bound (rejects unauthenticated "anon" sessions before Redis lookup)
  • /data/memory removed from /chat/media search dirs — prevents internal snapshots from being served publicly
  • skills.py: slug re-validated on toggle/edit/delete via _safe_skill_dir() — prevents directory traversal
  • http_request: _is_ssrf_target() blocks RFC-1918 + cloud metadata endpoints
  • Error responses: str(e) replaced with first line only, capped at 120 chars — prevents internal info leakage

Dashboard

  • 3 new pages: Vector Memory (/vector-memory), World Model (/world-model), Skill Evolution (/skill-evolution)

Scale

  • Scheduler: 20 → 22 background jobs
  • Memory: 8 → 9 persistent layers
  • DB: 14 → 18 tables (memory_embeddings, skill_patterns, entity_states, state_predictions)
  • 12 new configuration feature-flag variables

Phases 1–18 (Core Development)

The initial 18 development phases built the foundational architecture of WASP:

PhaseKey Systems
1–3Event-driven architecture (Redis Streams), core agent loop, episodic memory (PostgreSQL)
4–6Skill system (SkillBase, SkillExecutor, PolicyEngine), custom skills, task scheduler
7Health monitor, SelfHealer, Introspector
8Dashboard (Quart), session auth, CSRF protection, audit logging
9Agent autonomy — shell, Python execution, browser (Selenium + Chromium), named sessions
10Knowledge Graph (PostgreSQL + Redis cache, rule-based NLP extraction)
11Self-Model (Redis agent:self_model), Epistemic State, domain confidence
12Procedural Memory (abstract_procedure(), keyword retrieval, few-shot injection)
13Temporal World Model (world_timeline table, price/state extraction, trend detection)
14Anticipatory Simulation (pre-execution consequence analysis for privileged skills)
15Multi-agent orchestration v1 (AgentOrchestrator, AgentRuntime, CapabilitySandbox, inter-agent PostgreSQL bus)
16Dream Mode (DreamJob: memory consolidation, KG enrichment, LLM reflection, world pre-fetch)
17Autonomous Goal Generator (proactive LLM-evaluated goal creation, rate limiting, CPI guard)
18QA/SRE audit — 208 tests (unit/integration/e2e/chaos/security), 9 connector ID fixes, Makefile

Statistics at v2.2

MetricCount
Built-in skills37
Background scheduler jobs27
Memory layers18 (11 primary + 7 auxiliary)
PostgreSQL tables20
Integration connectors40+
LLM providers11
Max LLM rounds (Sovereign)12
Max concurrent goals3
Max concurrent agents10
Test suite208 tests