Skill Safety

WASP's skill evolution engine generates Python code automatically. This page describes the safety mechanisms that validate generated code before execution.

AST Validation

Generated skill code is validated using Python's ast module before execution:

def _validate_skill_code(code: str) -> str | None:
    """Returns error message or None if code is safe."""
    import ast

    # Must parse as valid Python
    try:
        tree = ast.parse(code)
    except SyntaxError as e:
        return f"Syntax error: {e}"

    # Check all nodes for dangerous patterns
    for node in ast.walk(tree):
        # Block dangerous imports
        if isinstance(node, ast.Import):
            for alias in node.names:
                if alias.name.split('.')[0] in _DANGEROUS_IMPORTS:
                    return f"Dangerous import: {alias.name}"

        if isinstance(node, ast.ImportFrom):
            if node.module and node.module.split('.')[0] in _DANGEROUS_IMPORTS:
                return f"Dangerous from-import: {node.module}"

        # Block direct calls to dangerous builtins
        if isinstance(node, ast.Call):
            if isinstance(node.func, ast.Name):
                if node.func.id in _DANGEROUS_IMPORTS:
                    return f"Dangerous call: {node.func.id}()"

    return None  # Code is safe

Dangerous Imports Blocked

_DANGEROUS_IMPORTS = {
    "subprocess",   # Process execution
    "os",          # OS operations, path traversal
    "sys",         # System access, path manipulation
    "pty",         # Pseudo-terminal (shell escape)
    "ctypes",      # C library calls
    "pickle",      # Arbitrary code execution via deserialization
    "marshal",     # Low-level serialization
    "importlib",   # Dynamic module loading
    "__import__",  # Dynamic imports
    "eval",        # Code evaluation
    "exec",        # Code execution
    "compile",     # Code compilation
}

Skill Name Validation

Skill names must match a safe pattern to prevent path traversal:

_SAFE_SKILL_NAME_RE = re.compile(r"^[a-z][a-z0-9_]{1,48}$")

This prevents names like ../etc/passwd or __init__ from being used as skill directory names.

Structural Requirements

Generated skills must:

Contain class keyword (class-based)
Implement SkillBase interface
Have definition() method returning SkillDefinition
Have async execute(**params) → SkillResult method

What's Not Blocked

The AST validation catches direct dangerous imports, but has limitations:

Not blocked (by design — complex to detect):

Indirect access: getattr(builtins, 'eval')(code)
String-based eval via third-party libraries
Network calls via http.client or urllib (these are allowed)
File I/O via open() (this is allowed)

Security recommendation: Review generated skills manually before deploying to production-sensitive environments. Generated skills are stored in /data/skills/ and can be inspected:

cat /home/agent/data/skills/<skill_name>/skill.py

Skill File Permissions

ls -la /home/agent/data/skills/
# drwxr-xr-x  agent-skills  (owned by UID 1000)

Only the agent user can write to the skills directory. Skills are loaded at startup by scanning for skill.py files.

Disabling Skill Evolution

For maximum safety, disable automatic skill synthesis:

SKILL_EVOLUTION_ENABLED=false docker compose up -d agent-core

You can still create skills manually via skill_manager with your own code — all the same AST validation applies.

Reviewing Skill Patterns

Before a skill is synthesized, the pattern must appear at least 5 times:

docker exec agent-postgres psql -U agent -d agent -c "
  SELECT skill_names, COUNT(*) as occurrences
  FROM skill_patterns
  GROUP BY skill_names
  HAVING COUNT(*) >= 3
  ORDER BY occurrences DESC;
"

You can delete patterns to prevent unwanted synthesis:

docker exec agent-postgres psql -U agent -d agent -c "
  DELETE FROM skill_patterns WHERE skill_names = 'web_search,python_exec';
"

Self-Improve Syntax Validation (v2.6)

When the self_improve skill writes a Python file via the /self-improve dashboard:

ast.parse(content) is called before any disk write — a SyntaxError returns HTTP 400 and no file is written
A timestamped backup is created at /data/src_patches/backup_{ts}_{filename} before any overwrite
The Soft Safety Gate (_self_improve_soft_gate()) runs a deterministic pattern check:
- Blocks if content targets critical source files AND contains safety-weakening patterns
- Critical paths: sandbox.py, control_layer.py, behavioral.py, response_grounder.py, etc.
- Safety-weakening patterns: "disable sandbox", "bypass guard", "allow unrestricted execution", _HIGH_RISK_ACTIONS=frozenset(), etc.

# Three-tier decision: BLOCK / WARN / ALLOW
# BLOCK: critical path + safety-weakening pattern → SkillResult(success=False)
# WARN: critical path + large patch → log warning, proceed
# ALLOW: everything else → proceed normally

All gate decisions are logged with action="skill.self_improve" in the audit log.

Security Summary

Control	Strength	Purpose
AST import blocking	Medium	Prevents obvious dangerous imports in generated skills
AST syntax validation	High	Rejects malformed Python before self_improve writes
Pre-write backup	High	Timestamped backup before any self_improve overwrite
Soft Safety Gate	High	Blocks safety-weakening edits to critical source paths
SHA-256 sidecar	Medium	Tamper detection for persisted patches (`/data/src_patches/*.sha256`)
Skill name regex	High	Prevents path traversal in skill directory names
Structural validation	Medium	Ensures valid skill interface
Container isolation	High	Process-level containment
Audit logging	High	Detection and forensics for all RESTRICTED/PRIVILEGED calls
Pattern threshold (5)	Low	Slows automatic skill synthesis rate

AST Validation​

Dangerous Imports Blocked​

Skill Name Validation​

Structural Requirements​

What's Not Blocked​

Skill File Permissions​

Disabling Skill Evolution​

Reviewing Skill Patterns​

Self-Improve Syntax Validation (v2.6)​

Security Summary​