Skill Safety
WASP's skill evolution engine generates Python code automatically. This page describes the safety mechanisms that validate generated code before execution.
AST Validation
Generated skill code is validated using Python's ast module before execution:
def _validate_skill_code(code: str) -> str | None:
"""Returns error message or None if code is safe."""
import ast
# Must parse as valid Python
try:
tree = ast.parse(code)
except SyntaxError as e:
return f"Syntax error: {e}"
# Check all nodes for dangerous patterns
for node in ast.walk(tree):
# Block dangerous imports
if isinstance(node, ast.Import):
for alias in node.names:
if alias.name.split('.')[0] in _DANGEROUS_IMPORTS:
return f"Dangerous import: {alias.name}"
if isinstance(node, ast.ImportFrom):
if node.module and node.module.split('.')[0] in _DANGEROUS_IMPORTS:
return f"Dangerous from-import: {node.module}"
# Block direct calls to dangerous builtins
if isinstance(node, ast.Call):
if isinstance(node.func, ast.Name):
if node.func.id in _DANGEROUS_IMPORTS:
return f"Dangerous call: {node.func.id}()"
return None # Code is safe
Dangerous Imports Blocked
_DANGEROUS_IMPORTS = {
"subprocess", # Process execution
"os", # OS operations, path traversal
"sys", # System access, path manipulation
"pty", # Pseudo-terminal (shell escape)
"ctypes", # C library calls
"pickle", # Arbitrary code execution via deserialization
"marshal", # Low-level serialization
"importlib", # Dynamic module loading
"__import__", # Dynamic imports
"eval", # Code evaluation
"exec", # Code execution
"compile", # Code compilation
}
Skill Name Validation
Skill names must match a safe pattern to prevent path traversal:
_SAFE_SKILL_NAME_RE = re.compile(r"^[a-z][a-z0-9_]{1,48}$")
This prevents names like ../etc/passwd or __init__ from being used as skill directory names.
Structural Requirements
Generated skills must:
- Contain
classkeyword (class-based) - Implement
SkillBaseinterface - Have
definition()method returningSkillDefinition - Have
async execute(**params) → SkillResultmethod
What's Not Blocked
The AST validation catches direct dangerous imports, but has limitations:
Not blocked (by design — complex to detect):
- Indirect access:
getattr(builtins, 'eval')(code) - String-based eval via third-party libraries
- Network calls via
http.clientorurllib(these are allowed) - File I/O via
open()(this is allowed)
Security recommendation: Review generated skills manually before deploying to production-sensitive environments. Generated skills are stored in /data/skills/ and can be inspected:
cat /home/agent/data/skills/<skill_name>/skill.py
Skill File Permissions
ls -la /home/agent/data/skills/
# drwxr-xr-x agent-skills (owned by UID 1000)
Only the agent user can write to the skills directory. Skills are loaded at startup by scanning for skill.py files.
Disabling Skill Evolution
For maximum safety, disable automatic skill synthesis:
SKILL_EVOLUTION_ENABLED=false docker compose up -d agent-core
You can still create skills manually via skill_manager with your own code — all the same AST validation applies.
Reviewing Skill Patterns
Before a skill is synthesized, the pattern must appear at least 5 times:
docker exec agent-postgres psql -U agent -d agent -c "
SELECT skill_names, COUNT(*) as occurrences
FROM skill_patterns
GROUP BY skill_names
HAVING COUNT(*) >= 3
ORDER BY occurrences DESC;
"
You can delete patterns to prevent unwanted synthesis:
docker exec agent-postgres psql -U agent -d agent -c "
DELETE FROM skill_patterns WHERE skill_names = 'web_search,python_exec';
"
Self-Improve Syntax Validation (v2.6)
When the self_improve skill writes a Python file via the /self-improve dashboard:
ast.parse(content)is called before any disk write — aSyntaxErrorreturns HTTP 400 and no file is written- A timestamped backup is created at
/data/src_patches/backup_{ts}_{filename}before any overwrite - The Soft Safety Gate (
_self_improve_soft_gate()) runs a deterministic pattern check:- Blocks if content targets critical source files AND contains safety-weakening patterns
- Critical paths:
sandbox.py,control_layer.py,behavioral.py,response_grounder.py, etc. - Safety-weakening patterns: "disable sandbox", "bypass guard", "allow unrestricted execution",
_HIGH_RISK_ACTIONS=frozenset(), etc.
# Three-tier decision: BLOCK / WARN / ALLOW
# BLOCK: critical path + safety-weakening pattern → SkillResult(success=False)
# WARN: critical path + large patch → log warning, proceed
# ALLOW: everything else → proceed normally
All gate decisions are logged with action="skill.self_improve" in the audit log.
Security Summary
| Control | Strength | Purpose |
|---|---|---|
| AST import blocking | Medium | Prevents obvious dangerous imports in generated skills |
| AST syntax validation | High | Rejects malformed Python before self_improve writes |
| Pre-write backup | High | Timestamped backup before any self_improve overwrite |
| Soft Safety Gate | High | Blocks safety-weakening edits to critical source paths |
| SHA-256 sidecar | Medium | Tamper detection for persisted patches (/data/src_patches/*.sha256) |
| Skill name regex | High | Prevents path traversal in skill directory names |
| Structural validation | Medium | Ensures valid skill interface |
| Container isolation | High | Process-level containment |
| Audit logging | High | Detection and forensics for all RESTRICTED/PRIVILEGED calls |
| Pattern threshold (5) | Low | Slows automatic skill synthesis rate |