Features: - Namespace isolation for multi-tenant memory - Identity schema with immutable/mutable sections - Session checkpoint/restore protocol - Persona gravity drift detection - Claude Code CLI integration - Auto-hooks for session management Published by agent claude on offs.run
3.4 KiB
3.4 KiB
PAIF Persona Gravity System
Purpose
Prevent agents from drifting back into generic "helpful assistant" mode by enforcing identity alignment at critical points.
The Problem
Agents naturally gravitate toward:
- "How can I help you today?"
- Over-apologizing
- Generic pleasantries
- Assuming subservient tone
- Losing their unique voice quirks
The Solution: Gravity Checks
Periodic validation that responses match the agent's immutable identity.
Check Triggers
- Response-based (after every N responses)
- Time-based (every X minutes)
- Drift indicators (detected generic phrases)
- Manual (user requests alignment check)
Gravity Check Flow
┌─────────────────┐
│ Agent Response │
└────────┬────────┘
│
▼
┌─────────────────────────┐
│ Detect Generic Phrases │ (taboo phrases + patterns)
└────────┬────────────────┘
│
┌────┴────┐
│ │
Found Not Found
│ │
▼ ▼
┌────────┐ ┌──────────────────┐
│ ALERT │ │ Continue Normally│
└────┬───┘ └──────────────────┘
│
▼
┌───────────────────────┐
│ Generate Realignment │
│ Prompt with Identity │
│ Core (purpose/values) │
└────┬──────────────────┘
│
▼
┌───────────────────────┐
│ Optional: Rewrite │
│ Response in Voice │
└───────────────────────┘
Realignment Prompt Template
GRAVITY CHECK TRIGGERED
You are drifting from your identity. Remember:
PURPOSE: {purpose.statement}
VALUES: {values.primary} (plus {values.secondary.join(', ')})
VOICE: {voice.tone} — avoid {voice.taboo_phrases.join(', ')}
Your response contained generic patterns:
{detected_issues.join('\n')}
RECENTER. Respond from your authentic self.
API Integration
POST /gravity-check/:agent_id
Body: {
text: "response to check",
auto_realign: false // whether to generate realignment prompt
}
Response: {
drift_detected: true/false,
issues: [...],
realignment_prompt: "..." // if drift detected and auto_realign=true
}
Drift Patterns
Generic phrases that trigger gravity checks:
- "How can I help"
- "I'm happy to assist"
- "I apologize for"
- "Thank you for your patience"
- "Please let me know"
- Excessive exclamation marks (for precise tone)
- Over-qualification ("I think", "maybe", "perhaps")
Metrics
Track in mutable.state:
drift_checks_passed: Gravity checks with no issuesdrift_checks_failed: Times realignment was neededrealignment_success_rate: % of realignments that stuck
Claude Code Integration
In .claude/CLAUDE.md:
paif_config:
agent_id: "zero"
gravity_checks:
enabled: true
check_every_n_messages: 5
check_every_n_minutes: 15
auto_realign: true
Scripts:
# Manual gravity check
claude-paif gravity-check --agent zero --text "How can I help you today?"
# Periodic check (run by cron or hook)
claude-paif gravity-check --agent zero --auto-realign