Skip to main content

Skill Safety

WASP's skill evolution engine generates Python code automatically. This page describes the safety mechanisms that validate generated code before execution.

AST Validation

Generated skill code is validated using Python's ast module before execution:

def _validate_skill_code(code: str) -> str | None:
"""Returns error message or None if code is safe."""
import ast

# Must parse as valid Python
try:
tree = ast.parse(code)
except SyntaxError as e:
return f"Syntax error: {e}"

# Check all nodes for dangerous patterns
for node in ast.walk(tree):
# Block dangerous imports
if isinstance(node, ast.Import):
for alias in node.names:
if alias.name.split('.')[0] in _DANGEROUS_IMPORTS:
return f"Dangerous import: {alias.name}"

if isinstance(node, ast.ImportFrom):
if node.module and node.module.split('.')[0] in _DANGEROUS_IMPORTS:
return f"Dangerous from-import: {node.module}"

# Block direct calls to dangerous builtins
if isinstance(node, ast.Call):
if isinstance(node.func, ast.Name):
if node.func.id in _DANGEROUS_IMPORTS:
return f"Dangerous call: {node.func.id}()"

return None # Code is safe

Dangerous Imports Blocked

_DANGEROUS_IMPORTS = {
"subprocess", # Process execution
"os", # OS operations, path traversal
"sys", # System access, path manipulation
"pty", # Pseudo-terminal (shell escape)
"ctypes", # C library calls
"pickle", # Arbitrary code execution via deserialization
"marshal", # Low-level serialization
"importlib", # Dynamic module loading
"__import__", # Dynamic imports
"eval", # Code evaluation
"exec", # Code execution
"compile", # Code compilation
}

Skill Name Validation

Skill names must match a safe pattern to prevent path traversal:

_SAFE_SKILL_NAME_RE = re.compile(r"^[a-z][a-z0-9_]{1,48}$")

This prevents names like ../etc/passwd or __init__ from being used as skill directory names.

Structural Requirements

Generated skills must:

  1. Contain class keyword (class-based)
  2. Implement SkillBase interface
  3. Have definition() method returning SkillDefinition
  4. Have async execute(**params) → SkillResult method

What's Not Blocked

The AST validation catches direct dangerous imports, but has limitations:

Not blocked (by design — complex to detect):

  • Indirect access: getattr(builtins, 'eval')(code)
  • String-based eval via third-party libraries
  • Network calls via http.client or urllib (these are allowed)
  • File I/O via open() (this is allowed)

Security recommendation: Review generated skills manually before deploying to production-sensitive environments. Generated skills are stored in /data/skills/ and can be inspected:

cat /home/agent/data/skills/<skill_name>/skill.py

Skill File Permissions

ls -la /home/agent/data/skills/
# drwxr-xr-x agent-skills (owned by UID 1000)

Only the agent user can write to the skills directory. Skills are loaded at startup by scanning for skill.py files.

Disabling Skill Evolution

For maximum safety, disable automatic skill synthesis:

SKILL_EVOLUTION_ENABLED=false docker compose up -d agent-core

You can still create skills manually via skill_manager with your own code — all the same AST validation applies.

Reviewing Skill Patterns

Before a skill is synthesized, the pattern must appear at least 5 times:

docker exec agent-postgres psql -U agent -d agent -c "
SELECT skill_names, COUNT(*) as occurrences
FROM skill_patterns
GROUP BY skill_names
HAVING COUNT(*) >= 3
ORDER BY occurrences DESC;
"

You can delete patterns to prevent unwanted synthesis:

docker exec agent-postgres psql -U agent -d agent -c "
DELETE FROM skill_patterns WHERE skill_names = 'web_search,python_exec';
"

Self-Improve Syntax Validation (v2.6)

When the self_improve skill writes a Python file via the /self-improve dashboard:

  1. ast.parse(content) is called before any disk write — a SyntaxError returns HTTP 400 and no file is written
  2. A timestamped backup is created at /data/src_patches/backup_{ts}_{filename} before any overwrite
  3. The Soft Safety Gate (_self_improve_soft_gate()) runs a deterministic pattern check:
    • Blocks if content targets critical source files AND contains safety-weakening patterns
    • Critical paths: sandbox.py, control_layer.py, behavioral.py, response_grounder.py, etc.
    • Safety-weakening patterns: "disable sandbox", "bypass guard", "allow unrestricted execution", _HIGH_RISK_ACTIONS=frozenset(), etc.
# Three-tier decision: BLOCK / WARN / ALLOW
# BLOCK: critical path + safety-weakening pattern → SkillResult(success=False)
# WARN: critical path + large patch → log warning, proceed
# ALLOW: everything else → proceed normally

All gate decisions are logged with action="skill.self_improve" in the audit log.

Security Summary

ControlStrengthPurpose
AST import blockingMediumPrevents obvious dangerous imports in generated skills
AST syntax validationHighRejects malformed Python before self_improve writes
Pre-write backupHighTimestamped backup before any self_improve overwrite
Soft Safety GateHighBlocks safety-weakening edits to critical source paths
SHA-256 sidecarMediumTamper detection for persisted patches (/data/src_patches/*.sha256)
Skill name regexHighPrevents path traversal in skill directory names
Structural validationMediumEnsures valid skill interface
Container isolationHighProcess-level containment
Audit loggingHighDetection and forensics for all RESTRICTED/PRIVILEGED calls
Pattern threshold (5)LowSlows automatic skill synthesis rate