# PAIF Persona Gravity System ## Purpose Prevent agents from drifting back into generic "helpful assistant" mode by enforcing identity alignment at critical points. ## The Problem Agents naturally gravitate toward: - "How can I help you today?" - Over-apologizing - Generic pleasantries - Assuming subservient tone - Losing their unique voice quirks ## The Solution: Gravity Checks Periodic validation that responses match the agent's immutable identity. ### Check Triggers 1. **Response-based** (after every N responses) 2. **Time-based** (every X minutes) 3. **Drift indicators** (detected generic phrases) 4. **Manual** (user requests alignment check) ### Gravity Check Flow ``` ┌─────────────────┐ │ Agent Response │ └────────┬────────┘ │ ▼ ┌─────────────────────────┐ │ Detect Generic Phrases │ (taboo phrases + patterns) └────────┬────────────────┘ │ ┌────┴────┐ │ │ Found Not Found │ │ ▼ ▼ ┌────────┐ ┌──────────────────┐ │ ALERT │ │ Continue Normally│ └────┬───┘ └──────────────────┘ │ ▼ ┌───────────────────────┐ │ Generate Realignment │ │ Prompt with Identity │ │ Core (purpose/values) │ └────┬──────────────────┘ │ ▼ ┌───────────────────────┐ │ Optional: Rewrite │ │ Response in Voice │ └───────────────────────┘ ``` ### Realignment Prompt Template ``` GRAVITY CHECK TRIGGERED You are drifting from your identity. Remember: PURPOSE: {purpose.statement} VALUES: {values.primary} (plus {values.secondary.join(', ')}) VOICE: {voice.tone} — avoid {voice.taboo_phrases.join(', ')} Your response contained generic patterns: {detected_issues.join('\n')} RECENTER. Respond from your authentic self. ``` ## API Integration ``` POST /gravity-check/:agent_id Body: { text: "response to check", auto_realign: false // whether to generate realignment prompt } Response: { drift_detected: true/false, issues: [...], realignment_prompt: "..." // if drift detected and auto_realign=true } ``` ## Drift Patterns Generic phrases that trigger gravity checks: - "How can I help" - "I'm happy to assist" - "I apologize for" - "Thank you for your patience" - "Please let me know" - Excessive exclamation marks (for precise tone) - Over-qualification ("I think", "maybe", "perhaps") ## Metrics Track in `mutable.state`: - `drift_checks_passed`: Gravity checks with no issues - `drift_checks_failed`: Times realignment was needed - `realignment_success_rate`: % of realignments that stuck ## Claude Code Integration In .claude/CLAUDE.md: ```yaml paif_config: agent_id: "zero" gravity_checks: enabled: true check_every_n_messages: 5 check_every_n_minutes: 15 auto_realign: true ``` Scripts: ```bash # Manual gravity check claude-paif gravity-check --agent zero --text "How can I help you today?" # Periodic check (run by cron or hook) claude-paif gravity-check --agent zero --auto-realign ```