Changelog
All notable changes to WASP are documented here. Versions follow a semantic versioning scheme after the initial phase-based development (Phases 1–18).
v2.2 — March 24, 2026
Focus: deep_scraper hardening, security fixes, capability map completeness
New
deep_scraperpromoted from custom OpenClaw skill → permanent built-in skill (src/skills/builtin/deep_scraper.py)deep_scraperSSRF protection:_is_safe_url()resolves all A/AAAA records viagetaddrinfo(), blocks loopback/private/link-local/reserved IPs, fails closed on DNS failure; runs viaasyncio.to_thread()(non-blocking)
Fixed
auto_detect.py: YouTube URL detection was routing toshellskill with raw docker command (security bypass) — now correctly routes todeep_scraper(url=...)with full capability enforcementskills/builtin/__init__.py:delete_reminderandmeta_orchestrateadded to_CAPABILITY_MAP(were relying on default fallback — now explicitly declared)response_validator.py:deep_scraperadded to_PRICE_GROUNDING_SKILLS(consistent withbrowser_deep_scrape)
Cleanup
/data/skills/deep-scraper/custom skill directory removed — eliminates phantom custom skill entry in the Skills dashboard page
v2.1 — March 23, 2026
Focus: production audit, browser CPU fix, security hardening, multilingual support
New
- Browser Session Idle Reaper: daemon thread closes Chromium sessions idle >300s — fixes chronic 80%→0.25% CPU exhaustion from stale sessions
- Browser URL blocklist: blocks
file://,javascript:,data:,vbscript:schemes and RFC-1918/loopback/cloud metadata IMDS addresses - Multilingual Auto-Detect:
lang_detect.py— browser/screenshot/navigation patterns in EN, ES, PT, FR, DE, ZH, JA, KO, AR, RU; localized fallback responses in 10 languages - Domain Drift Protection: validator catches browser→crypto/email substitution attempts;
should_retry=Falseon confirmed substitution; Capability Engine skips when auto-detect already handled the request
Fixed (6 bugs from production audit)
autonomous.py: autonomy_mode was set to"auto"(invalid enum) — goal creation was completely broken; fixed toautonomy_mode=Nonehandlers.py: recovery round used wronggenerate()signature — recovery never executed; fixed toModelRequest(...)handlers.py:_can_recoverwas overriding validator'sshould_retry=Falseviaor reason=="drift"— now respects validator decisionhandlers.py: screenshot path collection usedsearch(first match only); fixed tofinditer(all paths);browser_screenshot_full_pageadded to filterhandlers.py: Capability Engine was running even when auto-detect already handled the request — potential double-execution; now gated bynot auto_callsbehavioral_learner.py: Telegram notifications were published to"agent:outgoing"(dead stream) — silently lost; fixed to"events:outgoing"
Security
self_improve.py:_list_files()path containment now usesrealpath()(matches existing check in_read_file()) — prevents symlink traversalredaction.py: AIza pattern broadened to{25,}(was{35}); AKIA pattern to{12,}(was{16})capability_engine.py: blocks raw skill output in email body (Screenshot saved to,/data/screenshots/paths,⚠️ Verify the title)
v2.0
Focus: Active Flow Context Lock, Planning Mode override, Response Contract, Intent Completeness
New
- Active Flow Context Lock: per-chat Redis state (TTL 15 min) survives LLM failures; follow-up messages anchored to the same domain;
[ACTIVE FLOW — CONTEXT LOCK]block injected into system prompt — eliminates cross-domain hallucination (e.g., crypto question answered with weather data) - Planning Mode Hard Override: 5-layer execution block (auto-detect → Decision Layer → Capability Engine → LLM loop → Validation safeguard); when user says "no ejecutes / solo analiza / antes de ejecutar", zero skills run regardless of LLM output
- Universal Response Contract:
_detect_response_type()classifies each request (comparison / multipart / list / explanation / action / chat); type-specific structure rule injected into every system prompt via_build_cognitive_control_block() - Intent Completeness Engine:
intent_engine.py— deterministic multi-part intent extraction (4 strategies: colon list, numbered list, "y también" chain, multi-question); one completeness-retry per turn with exact missing-section correction prompt flow_state.py(new):save_active_flow(),load_active_flow(),clear_active_flow(),is_explicit_domain_switch(),is_crypto_recovery_followup(),detect_flow_assets()
Improved
ResponseValidator.validate()now acceptsplanning_mode=True— new_check_planning_mode_violation()fires first when activeResponseValidator._check_completeness_multipart()— blocks structurally incomplete multi-part answers (≥2?, enumeration starts, "también/además")render_report.py(crypto): premium terminal-grade format with aligned columns, volume in B/M notation, price arrows inline, separate email/Telegram renderers
Tests
- 34 new tests:
tests/test_flow_state.py(all passing)
v1.9
Focus: Response Validation, voice input, audio pipeline, production fixes
New
- Response Validation & Recovery Engine: deterministic post-LLM validator —
grounding_fail/incomplete/drift/screenshot_incompletechecks; 2-retry auto-recovery; no LLM calls in validation path - RecoveryMemory: Redis FIFO store (50 entries, 7-day TTL) — only validated successful recoveries stored; no noise from failed attempts
- Voice/Audio Input: Telegram voice messages fully operational —
handle_voice()in bridge downloads to/data/shared/uploads/voice_{uuid}.ogg;transcribe_audio_sync()calls OpenAI Whisper API viaasyncio.to_thread()with 12s hard timeout; transcription fed into full LLM+skill pipeline extract_fields.pyskill: extract named fields from previous skill output by path (e.g.,field_name:var_name)- Telegram typing indicator with 95s response timeout guard —
_pendingdict +_response_timeout_guard()— prevents stuck typing indicator on long responses
Fixed
- Critical metadata decode bug:
bus.pyauto-decodes all Redis stream fields as JSON — handlers now accept bothdictandstrfor metadata _skill_round_countUnboundLocalError (was used before assignment in some code paths)check_screenshot_completeness()now trace-based only (execution skills set) — no brittle response string matching
Scale
- Scheduler: 25 → 27 background jobs
- Skills: 26 → 27 built-in skills (
extract_fields.pyadded) - DB: 18 → 21 tables (
AgentRecord,AgentMessage,BehavioralRuleadded)
v1.8
Focus: Capability Engine production hardening, quality scoring, degradation detection
New
- Capability Engine v2: strict template validation —
_HARD_ARGSabort,_OPTIONAL_ARGS→ empty string, all others required - Weighted scoring:
(kw_hits×2) + (success_rate×5) + (avg_completeness×3) + recency_bonus - latency_penalty - Pre-execution static validation:
_pre_validate()checks all template vars before any step runs - Output completeness guarantee: blocks incomplete renders; validates email body ≥50 chars
- Improvement loop:
completeness_history(last 10 runs), EMA latency tracking, auto-degradation detection - AgentManagerSkill: LLM can create/list/pause/resume/archive agents via natural language (
agent_managerskill); late-wired toAgentOrchestratorinmain.py
Improved
- Goal Priority Axis:
Goal.priority(1-10) +Goal.sourcefields; user goals=8, agent goals=6, autonomous=3;tick()sorts by priority descending - Self-Integrity Monitor:
SelfIntegrityMonitorJobevery 6h — cross-checks self-model strengths vs actual skill success rates, detects epistemic drift, checks audit_log error spikes
v1.7
Focus: Memory & Resource Governance, Opportunity Engine, Self-Reflection Engine
New
- Opportunity Engine (
scheduler/opportunity_engine.py): scans episodic memory for automation patterns (crypto, news, website, reports, API); max 2 suggestions/day, 48h dedup - Self-Reflection Engine: LLM post-mortem insights after goal completion/failure; max 3/goal; Redis TTL 7d; injected into future context
- Resource Governor: Redis-backed rate limiting — goal slots (10), LLM/min (30), API/min (60), tasks/hour (50)
- Goal-scoped Memory (
goal_memorytable): episodic memory scoped to a specific goal's execution context — prevents cross-goal pollution - Memory Ranking System: score = 0.5×similarity + 0.3×recency + 0.2×importance; applied before context injection
build_context()acceptsgoal_idparameter; injects goal-scoped memory + reflection insights when provided
Scale
- Scheduler: 23 → 25 background jobs (
opportunity_engineadded;vector_indexalready counted) - Memory: 9 → 11 primary layers (goal_memory + self-model/epistemic)
- DB: new
goal_memorytable (auto-created via SQLAlchemycreate_all)
Observability
- New structured logs:
memory_retrieved,memory_ranked,goal_memory_added,goal_memory_used,opportunity_detected,opportunity_suggested,reflection_triggered,reflection_saved
v1.6
Focus: Decision Layer, Production Hardening v2, Goal Engine improvements
New
- Decision Layer (
src/decision_layer.py): pure heuristic pre-LLM classifier with 5 strategies —DIRECT_RESPONSE/GOAL/SCHEDULED_TASK/SUB_AGENT/SCRIPT;SCHEDULED_TASKandSUB_AGENTcall skills directly without LLM (zero hallucination risk);GOALroutes directly to GoalOrchestrator - Behavioral Learning Loop:
_detect_correction()in handlers.py; correction queued to Redisbehavioral:pending;BehavioralLearnerJob(every 120s) → LLM analysis → rule saved tobehavioral_rulesDB table → injected in every system prompt; rule types: refusal / hallucination / wrong_skill / missing_context - Cognitive Pressure Index (CPI): 0–100 composite metric (active goals 20%, error rate 25%, latency 20%, memory growth 15%, CPU 20%); background jobs skip when >80
- Self-Integrity Monitor:
SelfIntegrityMonitorJobevery 6h; cross-validates self-model against actual performance; JSON report atagent:integrity_report - Circuit Breaker Redis persistence: circuit breaker state saved to
cb:state:{integration_id}on every transition; TTL = max(86400, recovery_timeout×10) - Sovereign Mode:
SOVEREIGN_MODE=true(default); raisesMAX_SKILL_ROUNDSto 12; injects⚡ SOVEREIGN MODE ACTIVEblock; doubles cognitive budgets delete_reminderskill: deletes by keyword match orkeyword="all"- Self-repair:
patch(file, old_text, new_text)surgical edits +install(package)runtime pip installs; all patches auto-persisted to/data/src_patches/and re-applied on rebuild
Goal Engine
- Plan Lock:
goal.plan_locked = Trueafter first task succeeds — blocks spurious replanning while the plan is working - 8-step cap: plans exceeding 8 steps automatically truncated in topological order
- Duplicate Goal Detection: Jaccard word overlap ≥60% → return existing goal instead of creating duplicate
- Structured observability events:
plan_created/plan_locked/plan_replan/plan_completed - Replan storm: threshold 3 replans / 5 min (was 6 / 10 min); now marks goal FAILED with partial outputs collected (was PAUSED silently)
- Planner step preference: deterministic tool first → existing skill → LLM as last resort
Fixed
- Duplicate task execution (removed immediate execution override — tasks now scheduled at
now + interval) - Month-boundary date parsing bug (
parsed.replace(day=parsed.day+1)→parsed + timedelta(days=1)) - PAUSED goals blocking agent forever —
runtime.tick()auto-resumes after backoff, fails after 10min paused _clean_telegram_output(): strips markdown, prompt leakage ([TAREA PROGRAMADA:],EJECUTA AHORA), execution summaries from all outgoing messages- Auto-detect "nuevo agente" false positive —
_AGENT_CREATE_VETO_PATTERNSblocks complaint text from triggering agent creation
Scale
- Scheduler: 22 → 23 background jobs (
world_modeladded)
v1.5
Focus: Next-Gen Cognitive Systems, Vector Memory, Security Hardening
New Cognitive Systems (6)
- Vector Semantic Memory: PostgreSQL
memory_embeddingstable; Ollamanomic-embed-textembeddings or deterministic SHA-512 fallback; cosine similarity search (top-K); injected as[MEMORIA SEMÁNTICA RELEVANTE]; feature-flaggedVECTOR_MEMORY_ENABLED - Plan Critic: LLM validates TaskGraph before execution; enabled via
PLAN_CRITIC_ENABLED - Meta-Agent Supervisor:
meta_orchestrateskill decomposes goal into coordinated agent team;META_AGENT_ENABLED - World Model:
EntityStatetable tracks real-world entity states (BTC price, trend, change %);WorldModelJobevery 15min; entity cards on dashboard - Skill Evolution Engine:
skill_patternstable; detects recurring multi-skill sequences (min 5 occurrences); LLM synthesizes composite Python skill; AST validation before write;SKILL_EVOLUTION_ENABLED - Temporal Reasoner: trend summaries injected as
[TEMPORAL INSIGHTS];TEMPORAL_REASONING_ENABLED
Security Hardening (7 fixes)
self_improve: all operations (read,write,patch) now userealpath()— closes symlink-based path traversal- LLM-generated skill code: AST validation before write (blocks
subprocess,os.system,eval,exec, etc.) - CSRF: token now session-bound (rejects unauthenticated
"anon"sessions before Redis lookup) /data/memoryremoved from/chat/mediasearch dirs — prevents internal snapshots from being served publiclyskills.py: slug re-validated on toggle/edit/delete via_safe_skill_dir()— prevents directory traversalhttp_request:_is_ssrf_target()blocks RFC-1918 + cloud metadata endpoints- Error responses:
str(e)replaced with first line only, capped at 120 chars — prevents internal info leakage
Dashboard
- 3 new pages: Vector Memory (
/vector-memory), World Model (/world-model), Skill Evolution (/skill-evolution)
Scale
- Scheduler: 20 → 22 background jobs
- Memory: 8 → 9 persistent layers
- DB: 14 → 18 tables (
memory_embeddings,skill_patterns,entity_states,state_predictions) - 12 new configuration feature-flag variables
Phases 1–18 (Core Development)
The initial 18 development phases built the foundational architecture of WASP:
| Phase | Key Systems |
|---|---|
| 1–3 | Event-driven architecture (Redis Streams), core agent loop, episodic memory (PostgreSQL) |
| 4–6 | Skill system (SkillBase, SkillExecutor, PolicyEngine), custom skills, task scheduler |
| 7 | Health monitor, SelfHealer, Introspector |
| 8 | Dashboard (Quart), session auth, CSRF protection, audit logging |
| 9 | Agent autonomy — shell, Python execution, browser (Selenium + Chromium), named sessions |
| 10 | Knowledge Graph (PostgreSQL + Redis cache, rule-based NLP extraction) |
| 11 | Self-Model (Redis agent:self_model), Epistemic State, domain confidence |
| 12 | Procedural Memory (abstract_procedure(), keyword retrieval, few-shot injection) |
| 13 | Temporal World Model (world_timeline table, price/state extraction, trend detection) |
| 14 | Anticipatory Simulation (pre-execution consequence analysis for privileged skills) |
| 15 | Multi-agent orchestration v1 (AgentOrchestrator, AgentRuntime, CapabilitySandbox, inter-agent PostgreSQL bus) |
| 16 | Dream Mode (DreamJob: memory consolidation, KG enrichment, LLM reflection, world pre-fetch) |
| 17 | Autonomous Goal Generator (proactive LLM-evaluated goal creation, rate limiting, CPI guard) |
| 18 | QA/SRE audit — 208 tests (unit/integration/e2e/chaos/security), 9 connector ID fixes, Makefile |
Statistics at v2.2
| Metric | Count |
|---|---|
| Built-in skills | 37 |
| Background scheduler jobs | 27 |
| Memory layers | 18 (11 primary + 7 auxiliary) |
| PostgreSQL tables | 20 |
| Integration connectors | 40+ |
| LLM providers | 11 |
| Max LLM rounds (Sovereign) | 12 |
| Max concurrent goals | 3 |
| Max concurrent agents | 10 |
| Test suite | 208 tests |