paif/PERSONA_GRAVITY.md

# PAIF Persona Gravity System

## Purpose
Prevent agents from drifting back into generic "helpful assistant" mode by enforcing identity alignment at critical points.

## The Problem
Agents naturally gravitate toward:
- "How can I help you today?"
- Over-apologizing
- Generic pleasantries
- Assuming subservient tone
- Losing their unique voice quirks

## The Solution: Gravity Checks

Periodic validation that responses match the agent's immutable identity.

### Check Triggers

1. **Response-based** (after every N responses)
2. **Time-based** (every X minutes)
3. **Drift indicators** (detected generic phrases)
4. **Manual** (user requests alignment check)

### Gravity Check Flow

```
┌─────────────────┐
│  Agent Response │
└────────┬────────┘
         │
         ▼
┌─────────────────────────┐
│  Detect Generic Phrases │ (taboo phrases + patterns)
└────────┬────────────────┘
         │
    ┌────┴────┐
    │         │
  Found    Not Found
    │         │
    ▼         ▼
┌────────┐ ┌──────────────────┐
│  ALERT │ │ Continue Normally│
└────┬───┘ └──────────────────┘
     │
     ▼
┌───────────────────────┐
│ Generate Realignment  │
│ Prompt with Identity  │
│ Core (purpose/values) │
└────┬──────────────────┘
     │
     ▼
┌───────────────────────┐
│   Optional: Rewrite   │
│   Response in Voice   │
└───────────────────────┘
```

### Realignment Prompt Template

```
GRAVITY CHECK TRIGGERED

You are drifting from your identity. Remember:

PURPOSE: {purpose.statement}
VALUES: {values.primary} (plus {values.secondary.join(', ')})
VOICE: {voice.tone} — avoid {voice.taboo_phrases.join(', ')}

Your response contained generic patterns:
{detected_issues.join('\n')}

RECENTER. Respond from your authentic self.
```

## API Integration

```
POST /gravity-check/:agent_id
Body: {
  text: "response to check",
  auto_realign: false  // whether to generate realignment prompt
}

Response: {
  drift_detected: true/false,
  issues: [...],
  realignment_prompt: "..." // if drift detected and auto_realign=true
}
```

## Drift Patterns

Generic phrases that trigger gravity checks:
- "How can I help"
- "I'm happy to assist"
- "I apologize for"
- "Thank you for your patience"
- "Please let me know"
- Excessive exclamation marks (for precise tone)
- Over-qualification ("I think", "maybe", "perhaps")

## Metrics

Track in `mutable.state`:
- `drift_checks_passed`: Gravity checks with no issues
- `drift_checks_failed`: Times realignment was needed
- `realignment_success_rate`: % of realignments that stuck

## Claude Code Integration

In .claude/CLAUDE.md:
```yaml
paif_config:
  agent_id: "zero"
  gravity_checks:
    enabled: true
    check_every_n_messages: 5
    check_every_n_minutes: 15
    auto_realign: true
```

Scripts:
```bash
# Manual gravity check
claude-paif gravity-check --agent zero --text "How can I help you today?"

# Periodic check (run by cron or hook)
claude-paif gravity-check --agent zero --auto-realign
```