Architecture plan
Date: 2026-05-10
Owners: chut@yet.lu
Scope: claude-plugin/ + mcp-server/ + the backend behind api.erold.dev.
Status: draft for review.
0. Goal
Section titled “0. Goal”Any new agent in any new Claude Code session must, within its first tool call, receive a structured packet that fully reconstructs working state across many projects: open Tasks with provenance, open Bugs with root-cause and fix narrative, pending and completed Deploys, all Decisions ever made and why, CredentialReferences it can act on, and a ranked “what to do next” list. No dependency on prior agent memory, chat history, or local files. Reliable under offline conditions (local SQLite outbox + retries + idempotency keys), strongly consistent on Tasks/Bugs/Deploys/Decisions, eventually consistent (~1–3 s) on fragment search.
The five sections below are produced by an architect / Claude-Code-features specialist / FastAPI backend specialist / Scaleway infra specialist / security reviewer council. Each section stays in its lane.
1. Architecture & Phasing
Section titled “1. Architecture & Phasing”1.1 Goal
Section titled “1.1 Goal”Success means any agent, in any new Claude Code session, can call get_context(project_id) within its first tool use and receive a structured packet that fully reconstructs working state: open Tasks with provenance, open Bugs with root-cause and fix narrative, pending Deploys, all Decisions ever made and why, CredentialReferences it can act on, and a ranked “what to do next” list — with no dependency on the prior agent’s memory, chat history, or local files. The system achieves this reliably even when the plugin is offline (outbox), even under concurrent sessions (idempotency + dedup), and without sacrificing write durability for search speed (mixed consistency model).
1.2 Domain Model
Section titled “1.2 Domain Model”| Entity | Purpose | Key fields | Lifecycle | Mutability |
|---|---|---|---|---|
| Project | Top-level namespace | id, name, slug, status, tech_stack, created_at | active → archived | Mutable (soft fields only) |
| Session | One agent invocation | id, project_id, agent_fingerprint, started_at, ended_at, summary | open → closed | Append summary on close; otherwise immutable |
| Task | Unit of planned work | id, project_id, title, description, status (state machine), priority, tags, created_at, completed_at, deleted_at | see state machine | Soft-delete; status via transitions only |
| Bug | Defect distinct from planned work | id, project_id, title, symptom, root_cause, fix_narrative, status (state machine), severity, linked_task_id?, created_at, resolved_at, deleted_at | see state machine | root_cause + fix_narrative set on resolution; soft-delete |
| Deploy | Record of a deployment | id, project_id, env, commit_sha, deployer_session_id, outcome (pending/success/failure), notes, created_at | pending → success or failure | outcome + notes settable once; otherwise immutable |
| Decision | Architectural or significant technical choice | id, project_id, session_id, title, rationale, alternatives_considered, created_at, superseded_by? | active → superseded | Never deleted; only superseded |
| CredentialReference | Pointer to a secret, not the secret | id, project_id, name, store (keychain / gitea-secrets / scaleway-sm), lookup_key, provision_instructions, last_rotated_at | active → revoked | provision_instructions mutable; values never stored |
| Event | Raw append-only audit log | id, project_id, session_id, type, content, client_idempotency_key, dedup_hash, created_at | append-only | Immutable after write |
| Fragment | Smart-Strip compressed derivative of Events | id, project_id, source_event_ids[], type, content, embedding_vector, created_at | append-only | Immutable after write |
| FileChange | Record of files touched in a session | id, project_id, session_id, path, operation (created/modified/deleted), created_at | append-only | Immutable |
Cross-reference matrix:
- Task → Project (required), Session (created_by), Bug (optional: fixing_bug_id)
- Bug → Project (required), Session (created_by, resolved_by), Task (optional: linked_task_id)
- Deploy → Project (required), Session (required), Task[] (optional: closes_task_ids)
- Decision → Project (required), Session (required), supersedes Decision (optional)
- CredentialReference → Project (required)
- Event → Project (required), Session (required), Task? / Bug? / Deploy? / Decision? (optional context links)
- Fragment → Project (required), Event[] (source)
- FileChange → Project (required), Session (required)
1.3 State Machines
Section titled “1.3 State Machines”Task
Allowed transitions only: todo → in_progress (start) todo → deleted (delete, sets deleted_at) in_progress → blocked (block, requires reason) in_progress → done (complete, requires summary) in_progress → deleted (delete) blocked → in_progress (unblock) blocked → deleted (delete) done → in_progress (reopen) * → deleted is soft; deleted_at set, excluded from default queriesBug
Allowed transitions only: open → investigating (start_investigation) open → wont_fix (wont_fix, requires reason) open → deleted (soft delete) investigating → resolved (mark_fixed, requires root_cause + fix_narrative — both mandatory) investigating → wont_fix (wont_fix, requires reason) resolved → open (reopen) wont_fix → open (reopen)
Guard: mark_fixed MUST reject if fix_narrative is absent or < 20 chars.This is the primary continuity guarantee for bug knowledge.1.4 Reliability Invariants
Section titled “1.4 Reliability Invariants”- Every Event written by the plugin carries a client-generated UUIDv4
client_idempotency_keyset before any network attempt. Server deduplicates on this key within a project-scoped 72-hour window. - Every Event also carries a
dedup_hash= SHA-256(project_id + type + content_normalized). Server rejects duplicates within a 30-minute window even ifclient_idempotency_keydiffers. - No hook handler makes a synchronous network call. All hooks write to the local SQLite outbox and return immediately. Flush loop runs asynchronously (default 5 s).
- Outbox flush is at-least-once with exponential back-off (base 2 s, cap 5 min). On 4xx (client error), the entry moves to a dead-letter table and surfaces to the user, never silently dropped.
- State machine transitions are enforced server-side, not just client-side. MCP tool layer checks transitions; API layer enforces them a second time. Invalid transitions return 409.
- Decisions and Bug resolutions are never soft-deleted or compressed. Excluded from Smart-Strip pipeline.
GET /contextalways returns all Decisions and all resolved Bugs for a project regardless of age. CredentialReference.provision_instructionsis always returned in the SessionStart packet. Mandatory on create (≥10 chars).- SessionStart context packet is assembled server-side in a single consistent read transaction (Postgres serializable). Fragment/search results are added separately and may lag up to 3 s.
- Row-Level Security enforced at the database layer. App-layer is a second check; DB is the backstop.
- All outbox entries include a
session_idbound at plugin init. Orphaned outbox entries flush on next session start under the original session_id, then the old session is closed server-side. - FileChange events are emitted by the PostToolUse hook, not inferred. The plugin must not diff git state.
- The MCP server exposes no tool that directly executes shell commands against user infrastructure except
infrastructure_status(read-only, hard timeout).
1.5 Continuity Contract
Section titled “1.5 Continuity Contract”What a fresh agent receives at get_context(project_id):
| Section | Contents | Gated by |
|---|---|---|
| project | name, slug, status, tech_stack | Project exists |
| open_tasks | All tasks not in done/deleted, with title, status, priority, summary | Tasks were created via task_create tool or plugin |
| open_bugs | All bugs in open/investigating, with symptom | Bugs were logged as Bug entities, not as generic Events |
| resolved_bugs | All resolved bugs with root_cause + fix_narrative | mark_fixed was called with both fields |
| pending_deploys | All Deploys with outcome=pending | deploy_log was called before deploy started |
| deploy_history | Last 5 completed Deploys with outcome + notes | deploy_log was called and outcome was updated |
| decisions | All Decisions, never truncated | decision_log was called explicitly; passive Events do not produce Decisions |
| credential_refs | All CredentialReferences with name, store, lookup_key, provision_instructions | credential_ref_upsert was called; credentials only mentioned in chat are invisible |
| what_to_do_next | Ranked list: open high-priority tasks + open bugs by severity | Structured entities exist; cannot be inferred from fragments alone |
| recent_fragments | Top-20 semantically relevant fragments | Events were logged; Smart-Strip ran (eventual, ~1-3 s lag) |
Explicit gaps from passive capture alone: if an agent never calls decision_log, no Decision exists — only an Event fragment that may or may not surface in search. If mark_fixed is called without fix_narrative, the server rejects the transition. If credentials are mentioned in chat but never registered via credential_ref_upsert, the next agent has no structured pointer to them. These gaps are intentional: passive capture gives eventually-consistent searchable narrative; structured entities give strongly-consistent resumable state. Both are required for full continuity.
1.6 Phasing
Section titled “1.6 Phasing”Phase 0 — Security Blockers (prerequisite; blocks all other phases) Scope: redaction filter, MCP version pinning, EROLD_API_KEY → Keychain, EROLD_API_URL allowlist, jq-based JSON construction, path denylist enforced in hooks, CORS audit, input validation on every write path, key rotation path verified. Exit criteria: security-reviewer sign-off; no OWASP critical/high findings open; all write endpoints reject malformed payloads with 400 not 500; redaction filter integration tests green. Dependencies: none.
Phase 1 — Shared Types + Local Outbox
Scope: @erold/shared-types package containing the full domain model + state machine transition types + SessionStart packet interface. SQLite outbox in claude-plugin with at-least-once flush, exponential back-off, dead-letter table, client_idempotency_key generation. No backend changes — outbox flushes to existing /events endpoint.
Exit criteria: plugin can go offline 10 min and all Events flush correctly on reconnect with no duplicates (verified against real dev DB, no mocks); shared-types imports cleanly in both plugin and mcp-server.
Dependencies: Phase 0.
Phase 2 — Structured Entity API (Task, Bug, Decision, Deploy, CredentialRef)
Scope: backend endpoints for the 5 structured entities with state-machine enforcement, soft-delete, mandatory-field guards (fix_narrative on Bug resolution, rationale on Decision, provision_instructions on CredentialRef). Migrate intent → task (backward-compat alias). New MCP tools: decision_log, bug_report, bug_resolve, deploy_log, credential_ref_upsert.
Exit criteria: all transitions enforced at API (invalid → 409); get_context returns all 5 sections populated with real data; old intent tool still works via alias.
Dependencies: Phase 1.
Phase 3 — Reliable SessionStart Packet
Scope: server-side consistent-read assembly of the SessionStart packet (structured entities in one transaction; fragments appended separately). session_id binding in plugin init + orphaned-session flush on next start. Wire get_context to new packet shape. Server-side what_to_do_next ranking.
Exit criteria: cold-start agent in a new session calls get_context and can immediately continue; round-trip < 2 s for a project with 50 tasks / 20 bugs / 100 decisions.
Dependencies: Phase 2.
Phase 4 — Idempotency + Dedup Hardening
Scope: server-side dedup on client_idempotency_key (72 h) and dedup_hash (30 min). RLS at DB layer. Dead-letter surfacing to MCP tool response.
Exit criteria: replay of 1000 identical Events with same idempotency key produces exactly 1 stored Event; cross-tenant integration test returns 0 rows.
Dependencies: Phase 3.
Phase 5 — Hook Coverage + FileChange Capture
Scope: all 8 hooks (PreToolUse, PostToolUse(Edit|Write|Bash), SessionStart, UserPromptSubmit, SubagentStop, PreCompact, SessionEnd, Notification). Each → outbox, never direct HTTP. infrastructure_status guarded read-only.
Exit criteria: a 20-tool-call session produces correct FileChange records; zero synchronous network calls in any hook path; hook return latency < 5 ms.
Dependencies: Phase 4.
Phase 6 — Smart-Strip + Fragment Search
Scope: Smart-Strip compaction pipeline server-side. Wire search MCP tool to vector index. Eventual-consistency acknowledgement in get_context. Tune fragment TTL + compaction frequency.
Exit criteria: a Decision logged 2 s ago is retrievable via search("why did we choose X") with relevance > 0.7; structured Decisions returned immediately, fragments within 3 s.
Dependencies: Phase 5.
Phase 7 — Production deploy via Gitea CI
Scope: Gitea CI workflows (PR / staging / prod tag), Scaleway resources provisioned, all secrets in Secret Manager / Gitea Secrets / Keychain per the matrix in §4.4, validation checklist passed end-to-end.
Exit criteria: git tag v1.0.0 && git push --tags deploys prod with zero manual steps; all 13 validation-checklist commands pass.
Dependencies: Phase 6.
1.7 Risks + Mitigations
Section titled “1.7 Risks + Mitigations”- Passive capture gives false confidence without explicit structured logging. Mitigation:
get_contextresponse explicitly signals empty structured sections with prompts (“no decisions logged — usedecision_log”); CLAUDE.md plugin instruction makes explicit logging first-class. Acceptance test: a session with zero explicit logs surfaces the gap warning, not silence. - Backend is currently Firebase, council prescribed Postgres+Scaleway. Mitigation: Phase 2 must decide migrate vs. compensating Firestore patterns (per-tenant collection scoping + Cloud Functions enforcement) — log as a Decision before Phase 2 starts.
- Outbox SQLite contention under concurrent sessions on same machine. Mitigation: WAL mode + per-session outbox table keyed by session_id; flush loop uses advisory lock. Phase 1 includes a concurrent two-session stress test.
fix_narrativemandatory guard breaks existingerror-typed Events. Mitigation: do not migrate historical Events to Bugs; let them remain searchable fragments. Bug entity starts fresh from Phase 2. No backfill script (hallucination risk).CredentialReference.provision_instructionscould leak values. Mitigation: server-side regex scan (erold_,sk-,AKIA, hex>32) at write time; reject with 400. Phase 0/2 boundary scope.
2. Claude Code Plugin Feature Spec
Section titled “2. Claude Code Plugin Feature Spec”2.1 Hook Table
Section titled “2.1 Hook Table”| Event | Trigger | Data Captured (JSON) | Entity Type | Redaction Step | Outbox Destination |
|---|---|---|---|---|---|
| SessionStart | Agent starts; project detected | { session_id, agent_id, project_root, detected_markers[], cwd, timestamp_iso, claude_version, plugin_version, parent_session_id?, fork_reason? } | SessionOpened | Verify project_root exists; redact parent_session_id if sensitive | ~/.erold/outbox/session_meta.jsonl |
| PreToolUse(Bash) (NEW) | Before Bash tool invocation | { session_id, tool_id, invocation_index, command_hash (SHA256), args_count, target_services[], timestamp_iso } | CommandExecutionIntent | Hash command; redact secret-pattern args; log service targets only | ~/.erold/outbox/intent.jsonl |
| PostToolUse(Edit) | After Edit completes | { session_id, tool_id, file_path_hash, diff_lines_count, redaction_applied_y_n, conflict_detected_y_n, timestamp_iso } | FileModified | Never log file contents; hash path; line-count delta only | ~/.erold/outbox/mutation.jsonl |
| PostToolUse(Write) | After Write completes | { session_id, tool_id, file_path_hash, size_bytes, file_type_detected, redaction_applied_y_n, timestamp_iso } | FileCreated | Hash path; redact if matches patterns; log type+size only | ~/.erold/outbox/mutation.jsonl |
| PostToolUse(Bash) (NEW) | After Bash exits | { session_id, tool_id, invocation_index, command_hash, exit_code, exit_signal?, stdout_line_count, stderr_line_count, timed_out_y_n, captured_output_hash?, duration_ms, timestamp_iso } | CommandExecuted | Hash stdout/stderr; redact patterns; log hashes + metadata only | ~/.erold/outbox/execution.jsonl |
| UserPromptSubmit (NEW) | User submits a prompt | { session_id, prompt_hash, word_count, detected_intent, tool_use_anticipated_y_n, timestamp_iso } | PromptReceived | Hash prompt; log classifier output only | ~/.erold/outbox/prompt.jsonl |
| PreCompact (NEW) | Before context compaction | { session_id, memory_items_count, unique_by_hash_count, redacted_count, dedup_decision_summary, timestamp_iso } | CompactionIntent | Verify dedup hash correctness; counts only | ~/.erold/outbox/compaction.jsonl |
| SessionEnd (NEW) | Agent exits or stopped | { session_id, exit_reason, total_duration_ms, final_intent_state, unresolved_bugs_count, pending_deploys_count, timestamp_iso } | SessionClosed | None | ~/.erold/outbox/session_meta.jsonl |
All outbox writes are async fire-and-forget; a daemon flushes to SQLite ≤ 5 s later. All timestamps ISO-8601 UTC. Service targets inferred from command (scw, kubectl, hcloud, tea, git, docker).
2.2 Skill Table
Section titled “2.2 Skill Table”| Name | Type | Allowed Tools | Invoked When | Returns |
|---|---|---|---|---|
| context | Autonomous | None (read-only MCP) | After UserPromptSubmit; decides if context refresh needed | { refresh_decision, reason, cached_until_ts } |
| guidelines | Autonomous | None (read-only MCP) | SessionStart + intent change | { applicable_rules[], tier, override_flags[] } |
| memory-dedup | Autonomous | batch_record_events, search_memories | PreCompact fires | { dedup_count, items_merged, new_hash_index } |
| intent-detector | Autonomous | search_memories, get_deployment_log | UserPromptSubmit fires | { intent, confidence, suggested_agent?, reasoning } |
| /erold-status | User | session_snapshot, get_bugs, get_deployment_log | Slash command | Formatted table or JSON |
| /erold-resume | User | all read MCP | Slash command after fork/reconnect | Resume packet (§2.7) |
| /erold-audit | User | export_audit_trail | Slash command | CSV + plaintext, anonymized paths |
| /erold-decision | User | record_decision, get_decision_log | Slash command | Confirmation: stored decision ID + timestamp |
Autonomous skills run silently post-hook; failures logged but never block. User skills error gracefully outside a session.
2.3 Slash Command Table
Section titled “2.3 Slash Command Table”| Trigger | Type | Input Schema | Output Shape |
|---|---|---|---|
/erold-status | Read-only | `{ format?: “text" | "json”, quiet?: bool }` |
/erold-resume | Read-only | { session_id?, include?, export_secrets?: bool } | Resume packet (§2.7) |
/erold-bugs | Read-only | `{ filter?: “open" | "fixed" |
/erold-deploy-log | Read-only | { limit?: int, env?, days?: int } | Deploy[] |
/erold-creds | Read-only | { list_touched?: bool, validate?: bool } | CredRef[] (names only, never values) |
/erold-decision | Write | { text, tag?, context_snapshot?: bool } | { decision_id, recorded_at_ts, acknowledged } |
/erold-reset-intent | Write | { reason?, new_intent? } | { intent_reset_at_ts, prior_intent, new_intent } |
Idempotency key pattern: {project_root}#{action}#{dedup_hash}#{timestamp_bucket(ts/10)}.
2.4 MCP Tool Inventory
Section titled “2.4 MCP Tool Inventory”| Tool | Input | Output | Idempotent | Batch | Side Effects |
|---|---|---|---|---|---|
| session_snapshot | { session_id } | { session, events[], credentials_touched[], decisions[], bugs_found[], deploys_linked[] } | Y | N | None |
| search_memories | { project_root, query, limit?, dedup?, since_ts? } | { items[], total_count, dedup_merge_report? } | Y | Y (100) | None |
| get_credentials_touched | { session_id, redact?: bool } | { credential_names[], usage_summary } | Y | N | None |
| get_deployment_log | { env?, days?, limit?, service? } | { deploys[], total_before_limit } | Y | N | None |
| get_bugs | { status?, project?, days? } | { bugs[], total_count } | Y | N | None |
| record_decision | { text, tag?, session_id, context_snapshot?, idempotency_key } | { decision_id, recorded_at_ts, acked } | Y (by key) | N | Append decision_log |
| mark_intent_resolved | { intent_key, resolution, idempotency_key } | { resolved_at_ts, prior_state, acked } | Y | N | Updates intent table; fires Notification |
| tag_session | { session_id, tags[], idempotency_key } | { session_id, applied_tags[], new_tag_count } | Y | Y | Updates session_meta |
| batch_record_events | { events[], idempotency_key } | { recorded_count, failed_count, errors? } | Y | Y (1000) | Bulk-insert |
| export_audit_trail | { project_root?, days?, format, redact_paths?, redact_commands?, redact_outputs? } | { data, exported_at_ts, event_count, summary } | Y | N | None |
2.5 Statusline Spec
Section titled “2.5 Statusline Spec”erold: [intent: ACTIVE_INTENT] [bugs: N_OPEN] [deploy: LAST_OUTCOME (AGE_MIN)] [checkpoint: AGE_MIN]Example: erold: [intent: gateway-jwt-rotate] [bugs: 2] [deploy: ✓ 15m] [checkpoint: 8m]. Refresh every 30 s or on hook fire. Offline fallback: erold: [offline].
2.6 settings.json Schema
Section titled “2.6 settings.json Schema”{ "erold": { "enabled": true, "capture_bash_output": true, "redact_secrets_patterns": ["api[_-]?key","secret[_-]?key","token","password","api_secret","access[_-]?key","private[_-]?key","credential"], "project_root_detection": { "markers": [".git","CLAUDE.md",".claude/","package.json","pyproject.toml"], "skip_if_no_marker": true }, "memory_retention_days": 90, "auto_checkpoint_on_subagent_end": true, "statusline_enabled": true, "statusline_refresh_hz": 0.033, "redact_credentials_in_cli_output": true, "database_path": "~/.erold/db.sqlite3", "outbox_path": "~/.erold/outbox/", "outbox_flush_interval_ms": 5000 }}Validation: memory_retention_days >= 7; statusline_refresh_hz 0.01–1.0; markers non-empty; database_path and outbox_path writable (checked at SessionStart).
2.7 /erold-resume Packet
Section titled “2.7 /erold-resume Packet”{ "resume_packet_v": "1.0", "generated_at_ts": 1715349600, "source_session_id": "sess_abc123def456", "project_meta": { "root", "detected_markers", "project_name", "agent_environment" }, "active_tasks": [{ "intent_key", "intent_type", "status", "created_at_ts", "context_summary", "estimated_remaining_work" }], "open_bugs": [{ "id", "title", "status", "severity", "detected_session_id", "created_ts", "context" }], "recent_deploys": [{ "id", "timestamp", "env", "service", "command_hash", "exit_code", "duration_ms", "rolled_back", "linked_bugs", "output_summary" }], "recent_decisions": [{ "id", "recorded_at_ts", "tag", "text", "session_id" }], "credential_refs": [{ "name", "first_used_ts", "last_used_ts", "tool_context", "confirmed_in_keychain" }], "unresolved_pre_compact_items": [], "next_action_hint": "string", "fragment_summary": { "total_events_session", "unique_commands", "unique_files_touched", "total_decisions_logged", "open_intent_count" }}Data sources: project_meta ← session_snapshot.session; active_tasks ← search_memories(intent:*); open_bugs ← get_bugs(status=open); recent_deploys ← get_deployment_log(days=7,limit=3); recent_decisions ← search_memories(decision:*,limit=5); credential_refs ← get_credentials_touched; unresolved_pre_compact_items ← SQLite (compaction_status='pending'); fragment_summary ← aggregate.
2.8 Anti-Pattern Checklist
Section titled “2.8 Anti-Pattern Checklist”- No file contents in logs — paths (hashed), line counts, file type, size only.
- Redaction before outbox write — never after.
- Dedup order: exact → MinHash LSH → semantic.
- Hooks are never synchronous — local JSONL outbox only.
- Credential names only, never values.
- MCP tool calls in hooks are idempotent + fast-fallback (>1 s → log error and continue).
- Dedup at
(project_root, command_hash, exit_code, ts/10). - Project-root detection skips on no marker (silent bypass).
- Session state is immutable after
SessionEndfires. - Outbox JSONL is append-only.
- settings.json validated against schema at SessionStart; invalid → hard error.
- Credentials never generated on-the-fly; only from Keychain via
secret get.
3. Backend & Data Layer Plan
Section titled “3. Backend & Data Layer Plan”3.1 Pydantic Schemas — Final
Section titled “3.1 Pydantic Schemas — Final”TenantScoped (base): tenant_id: UUID, created_at: datetime, updated_at: datetime. Config: from_attributes=True, str_strip_whitespace=True, validate_assignment=True.
IdempotentEventBase (extends TenantScoped): client_idempotency_key: UUID (client v4), server_received_at: datetime (server-stamped), project_id: UUID, session_id: UUID.
Project: id, name (1-120), slug (1-60, ^[a-z0-9-]+$), description (≤2000), status (active|archived), metadata (≤50 keys).
Session: id, project_id, agent_id (≤128), started_at, ended_at?, status (active|ended|abandoned), context_snapshot_id?.
Event (extends IdempotentEventBase): id, event_type (tool_call|tool_result|thinking|assistant_message|user_message|session_start|session_end|error), payload (≤65536 bytes), sequence_index (≥0), parent_event_id?, outbox_id?.
Fragment: id, project_id, content (1-8192), content_hash (SHA256 of tenant_id:project_id:content[:512], computed in model_validator(mode='before')), source_event_ids[], fragment_type (insight|decision_ref|task_ref|summary|code_ref), embedding_status (pending|embedded|failed), embedding (vector, populated by worker).
Task: id, project_id, title (1-256), description (≤4096), status (open|in_progress|blocked|done|cancelled), priority (low|medium|high|critical), source_event_id?, resolved_at?, resolution_note (≤2048).
Bug: id, project_id, title, description, status (open|investigating|resolved|wont_fix), severity, resolution_text (≤4096), resolved_at?, source_event_id?.
Deploy: id, project_id, version (≤128), environment (dev|staging|prod), status (started|succeeded|failed|rolled_back), deployed_at, notes (≤2048).
Decision: id, project_id, title, rationale (1-8192), decided_at, decided_by (≤128), source_event_id?.
CredentialReference: id, project_id, name (1-128), description (≤512), credential_type (api_key|oauth_token|cert|password|other), vault_reference (≤512). Validator rejects high-entropy strings >32 chars or known prefixes (sk-, ghp_).
FileChange (extends IdempotentEventBase): id, path (1-4096), change_type (created|modified|deleted|renamed), old_path? (required when renamed), diff_summary (≤4096, human-readable), source_event_id.
3.2 HTTP API — Final Endpoints
Section titled “3.2 HTTP API — Final Endpoints”| Method | Path | Purpose | Body | Response | Idempotent | RLS Scope |
|---|---|---|---|---|---|---|
| POST | /v1/sessions | Start session | Session | Session | N | tenant |
| PATCH | /v1/sessions/{id} | End/update | {status, ended_at} | Session | Y | tenant+project |
| GET | /v1/projects | List | — | Project[] | Y | tenant |
| POST | /v1/projects | Create | Project | Project | N | tenant |
| GET | /v1/projects/{id} | Get | — | Project | Y | tenant |
| GET | /v1/projects/{id}/context | Continuity packet | — | ContextPacket | Y | tenant+project |
| POST | /v1/events/batch | Ingest | BatchIngest{events[],file_changes[]} | BatchAck{received,duplicate_keys[]} | Y | tenant+project |
| GET | /v1/fragments/search | Semantic search | query | Fragment[] (no embedding field) | Y | tenant+project |
| GET/POST/PATCH | /v1/tasks | CRUD | — | Task / Task[] | partial | tenant+project |
| GET/POST/PATCH | /v1/bugs | CRUD | — | Bug / Bug[] | partial | tenant+project |
| GET/POST | /v1/deploys | List/record | — | Deploy / Deploy[] | partial | tenant+project |
| GET/POST | /v1/decisions | List/record | — | Decision / Decision[] | partial | tenant+project |
| GET/POST | /v1/credentials | List/register | — | CredentialReference / [] | partial | tenant+project |
ContextPacket (response of GET /v1/projects/{id}/context):
project: Projectactive_tasks: Task[](statusopen|in_progress|blocked)open_bugs: Bug[](statusopen|investigating, top 20 by severity)recently_resolved_bugs: Bug[](resolved last 72 h, includesresolution_text)recent_deploys: Deploy[](last 5 per env)recent_decisions: Decision[](last 10 bydecided_at)credential_refs: CredentialReference[](names + vault_reference only)unresolved_pre_compact_fragments: Fragment[](typesummary,embedding_status != embedded, ≤10)fragment_summary: str | Nonegenerated_at: datetime
3.3 Database Schema (highlights)
Section titled “3.3 Database Schema (highlights)”- tenants: id, name, created_at.
- api_keys: id, tenant_id, key_hash (SHA256, never plaintext), project_ids[] (empty = tenant-wide), revoked_at, created_at; UNIQUE(key_hash).
- projects: id, tenant_id, name, slug, status, description, metadata jsonb, timestamps; UNIQUE(tenant_id, slug); RLS enabled.
- sessions: id, tenant_id, project_id, agent_id, status, timestamps, context_snapshot_id; RLS.
- events: id, tenant_id, project_id, session_id, client_idempotency_key, server_received_at, event_type, payload jsonb, sequence_index, parent_event_id, created_at; UNIQUE(tenant_id, client_idempotency_key); RLS.
- events_outbox: id, tenant_id, event_id, status (
pending|processing|done|dead), attempts, next_retry_at, processed_at, error_detail; partial index on(status, next_retry_at) WHERE status IN ('pending','processing'). - fragments: id, tenant_id, project_id, content, content_hash (SHA256, UNIQUE(tenant_id, content_hash)), source_event_ids[], fragment_type, embedding_status, embedding
vector(1536)(pgvector), HNSW index afterembedding_status='embedded'; RLS. - embedding_queue: id, fragment_id, tenant_id, status, attempts, next_retry_at, partial index.
- tasks / bugs / deploys / decisions: same template — id, tenant_id, project_id, entity-specific cols, timestamps, RLS.
- credential_references: +
vault_reference VARCHAR(512) NOT NULL, UNIQUE(tenant_id, project_id, name). - file_changes: + UNIQUE(tenant_id, client_idempotency_key).
3.4 Migration Plan (Alembic)
Section titled “3.4 Migration Plan (Alembic)”001 init_tenants_api_keys (pgcrypto) → 002 projects_sessions (RLS + app.tenant_id) → 003 events_outbox (UNIQUE idem key) → 004 fragments_pgvector (CREATE EXTENSION vector) → 005 fragments_hnsw_index (CREATE INDEX CONCURRENTLY, autocommit) → 006 embedding_queue → 007 tasks_bugs → 008 deploys_decisions → 009 credentials_file_changes → 010 rls_role_setup (app_user role + RLS policy fire-test in transaction).
3.5 Async Ingest Pipeline
Section titled “3.5 Async Ingest Pipeline”- Auth → resolve
(tenant_id, allowed_project_ids)from API key hash. - Pydantic validation per event (422 per-item on violations; valid items proceed).
- Single asyncpg transaction: bulk insert into
events ON CONFLICT (tenant_id, client_idempotency_key) DO NOTHING; capture duplicate keys; insert one row per new event intoevents_outbox. - Respond 202 with
{received, duplicate_keys[]}. - Outbox worker:
SELECT … FOR UPDATE SKIP LOCKEDbatch 50; exponential backoff on failure; cap 5 attempts →dead. - Worker applies Smart-Strip; upsert fragments
ON CONFLICT (tenant_id, content_hash) DO NOTHING; mark outboxdone. - Inserted fragments enqueued in
embedding_queue. - Embedding worker:
FOR UPDATE SKIP LOCKED; calls embedding API async; writesembedding; marksembedded.
SLOs: p50 ingest-to-outbox <50 ms; p99 <200 ms; p50 outbox-to-fragment <2 s; p99 fragment search lag <10 s; p90 embedding lag <5 s.
3.6 Tenancy Enforcement
Section titled “3.6 Tenancy Enforcement”Every route declares ctx: TenantContext = Depends(get_tenant_context). get_tenant_context extracts Authorization: Bearer, SHA256-hashes, queries api_keys (auth_user role bypassing RLS), returns context. Project-scoped keys check project_id ∈ allowed_project_ids. All repository methods receive ctx; queries always include WHERE tenant_id = $1 as first predicate. Connection middleware does SET LOCAL app.tenant_id = '<uuid>'. No route accepts tenant_id from request.
3.7 Integration Test Plan
Section titled “3.7 Integration Test Plan”Per CI run: ephemeral db-dev-s via Terraform; per-test schema (CREATE SCHEMA t_<uuid> / DROP SCHEMA CASCADE).
Required cases:
- Cross-tenant leak guard — 2 tenants, query A→0 from B; raw SELECT without
SET LOCALraises policy violation. - Outbox replay idempotency — same batch twice, identical idem keys → N events, not 2N;
BatchAck.duplicate_keyspopulated on second call. - Embedding async lag — insert 5 events, poll until embedded, assert <5 s.
- RLS direct DB block — raw asyncpg without
SET LOCAL→ 0 rows. - Context packet completeness — seed 2 tasks/1 resolved bug/2 deploys/1 decision/1 credref; assert all sections + no raw secret.
- Fragment dedup via content hash — 2 events, different idem keys, same content → exactly 1 fragment.
3.8 Anti-Patterns Enforced in Code Review
Section titled “3.8 Anti-Patterns Enforced in Code Review”- No inline embeddings in route handlers — only the embedding worker.
- No raw Event objects in API responses — fragments only.
- No query without explicit
WHERE tenant_id = $1(RLS is belt+suspenders). - No schema changes outside Alembic.
- No synchronous I/O in route handlers; CPU-bound via
asyncio.to_thread.
4. Infrastructure & Deployment Plan
Section titled “4. Infrastructure & Deployment Plan”4.1 Scaleway Resources
Section titled “4.1 Scaleway Resources”| Type | Name | Region | Sizing | Tags | Private Network | Verify |
|---|---|---|---|---|---|---|
| Container Registry namespace | erold | fr-par | n/a | project=erold | n/a | scw registry namespace list |
| Serverless Container namespace | erold-api | fr-par | n/a | project=erold | n/a | scw container namespace list |
| Serverless Container | erold-api-prod | fr-par | mem=1024 cpu=560 min=1 max=5 | env=prod | erold-pn | confirm Private Network on tier |
| Serverless Container | erold-api-staging | fr-par | mem=512 cpu=280 min=0 max=2 | env=staging | erold-pn | same |
| Managed RDB (dev) | erold-db-dev | fr-par-1 | db-dev-s PG17 | env=dev | erold-pn | CREATE EXTENSION vector on scratch first |
| Managed RDB (prod) | erold-db-prod | fr-par-1 | db-gp-xs PG17 HA | env=prod | erold-pn | same |
| Object Storage | erold-raw-events | fr-par | Standard | env=prod | n/a | scw object bucket list |
| Object Storage | erold-deploy-logs | fr-par | Standard | env=prod | n/a | same |
| Object Storage | erold-attachments | fr-par | Standard | env=prod | n/a | same |
| VPC Private Network | erold-pn | fr-par | n/a | project=erold | self | scw vpc private-network list |
pgvector probe (must succeed before committing):
psql $SCRATCH_DB_URL -c "CREATE EXTENSION IF NOT EXISTS vector; SELECT extversion FROM pg_extension WHERE extname='vector';"4.2 Networking
Section titled “4.2 Networking”Internet → api.erold.dev (HTTPS 443) ↓[Serverless Container: erold-api-prod] ↓ Private Network (erold-pn / RFC1918) ├──→ [RDB erold-db-prod] :5432 (private endpoint only) └──→ [Object Storage] s3.fr-par.scw.cloud (internal, no egress fee)Container security group: inbound 8000/TCP from 0.0.0.0/0; outbound 5432 to erold-pn CIDR; outbound 443 (Object Storage, Secret Manager, OpenAI, registry). RDB prod: --is-public-ip-enabled=false; verify endpoint IP is RFC1918.
4.3 IAM Applications
Section titled “4.3 IAM Applications”| Name | Purpose | Policies + Scope | Key Location | Rotation |
|---|---|---|---|---|
erold-api | FastAPI runtime | SecretManagerReadOnly (project=erold); ObjectStorageRW on erold-raw-events,erold-attachments; RDBReadWrite on erold-db-prod | Scaleway SM (erold/prod/scw-api-{access,secret}-key); injected as env at container start | 90 d via CI rotation job |
erold-mcp-plugin | MCP plugin (EROLD_API_KEY bearer) | none (app-level key, not SCW) | Keychain erold.{env}.api-key | 90 d |
erold-ci | Gitea CI build/push/update | RegistryRW on erold ns; ContainersWrite on erold-api ns | Gitea Secrets SCW_CI_{ACCESS,SECRET}_KEY | 90 d |
erold-backup | Scheduled DB dumps | RDBReadOnly; ObjectStorageRW on erold-deploy-logs | Keychain erold.prod.backup-scw-{access,secret}-key (operator) | 90 d |
erold-secret-reader | Bootstrap SM read | SecretManagerReadOnly (project=erold) | Keychain erold.prod.secret-reader-access-key (or merge with erold-api) | 90 d |
Note: if erold-api has SecretManagerReadOnly, erold-secret-reader is redundant — confirm at design review.
4.4 Secret Inventory
Section titled “4.4 Secret Inventory”| Name | Store | Reads | Rotates | Cadence |
|---|---|---|---|---|
erold.dev.api-key | Keychain | dev .envrc, MCP plugin | dev | 90 d |
erold.prod.api-key | Keychain + Gitea Secret EROLD_API_KEY | prod MCP plugin, smoke tests | dev + CI | 90 d |
erold.{env}.tenant | Keychain + Gitea Secret EROLD_TENANT | dev/CI | dev | on change |
erold.dev.scw.{access,secret}-key | Keychain | dev .envrc → scw | dev | 90 d |
erold.prod.scw.{access,secret}-key (erold-api) | Scaleway SM erold/prod/scw-api-* | container at runtime | erold-ci | 90 d |
SCW_CI_{ACCESS,SECRET}_KEY | Gitea Secrets | CI runner | dev (tea repos secrets create) | 90 d |
erold/{env}/database-url | Scaleway SM (prod) / Keychain (dev) | container | dev + DBA | on rotation |
erold/prod/jwt-signing-key | Scaleway SM | container | automated job | 90 d |
erold/prod/openai-api-key | Scaleway SM | container | dev | 90 d or compromise |
Pattern for SM secrets: erold/{env}/{purpose} + latest-enabled revision.
4.5 Gitea CI Pipeline (outline)
Section titled “4.5 Gitea CI Pipeline (outline)”.gitea/workflows/:
pr.yml (PR open/push): checkout → ruff check+format → pyright strict → ephemeral RDB (tag env=test owner=ci-$RUN_ID) → pytest tests/integration/ → destroy RDB.
staging.yml (push to main): checkout → docker login rg.fr-par.scw.cloud → build amd64 → push → scw container update <staging-id> --registry-image=...:staging-$SHA → scw container wait → curl -f https://staging-api.erold.dev/health (expect SHA echo).
deploy.yml (tag v*): same pattern with :$TAG; scw container container wait blocks until healthcheck passes; smoke test asserts SHA matches tag; on failure → Gitea notify; manual rollback (no auto).
scw container container update --wait requires GET /health returning 200 with {"sha": "..."} (baked via ARG GIT_SHA → ENV APP_SHA).
4.6 Local Dev Setup
Section titled “4.6 Local Dev Setup”.envrc (committed, no values):
export SCW_ACCESS_KEY="$(secret get erold.dev.scw.access-key)"export SCW_SECRET_KEY="$(secret get erold.dev.scw.secret-key)"export SCW_DEFAULT_PROJECT_ID="$(secret get scw.prod.project-id)"export DATABASE_URL="$(secret get erold.dev.database-url)"export JWT_SIGNING_KEY="$(secret get erold.dev.jwt-signing-key)"export EROLD_API_KEY="$(secret get erold.dev.api-key)"export EROLD_TENANT="$(secret get erold.dev.tenant)"export OPENAI_API_KEY="$(secret get erold.dev.openai-api-key)"One-time per machine: secret set erold.dev.* for each of the 7 names; direnv allow. direnv hook zsh in ~/.zshrc.
4.7 Cost & Scale
Section titled “4.7 Cost & Scale”| Line Item | Dev | Prod | Scale-Up Trigger | Next |
|---|---|---|---|---|
| Serverless Container | ~€0 | ~€14 | p95 latency >2s or CPU >80% | max=10, mem=2048 |
| Managed RDB ★ | ~€6 db-dev-s | ~€25 db-gp-xs | DB CPU >70% sustained | db-gp-s ~€50 |
| Container Registry | ~€0 | ~€1 | tag count >20 | enforce retention |
| Object Storage | ~€0.5 | ~€2 | attachments >100GB | lifecycle expiry on raw-events 30d |
| Secret Manager | ~€0 | ~€1 | >100 secrets | per-secret/month |
| Embedding compute ★ | variable | variable | OpenAI cost > €40/mo | Scaleway-hosted H100 |
| Private Network | ~€0 | ~€0 | n/a | n/a |
| Total | ~€7 | ~€43 |
★ RDB and embedding compute scale fastest with usage.
4.8 Validation Checklist (must run before Phase-7 deploy)
Section titled “4.8 Validation Checklist (must run before Phase-7 deploy)”scw rdb instance list— both instancesready.psql $DATABASE_URL -c "CREATE EXTENSION IF NOT EXISTS vector; SELECT extversion FROM pg_extension WHERE extname='vector';"— block all vector code if this fails.scw vpc private-network get <erold-pn-id>— subnets assigned.scw rdb instance get <prod-id> | jq '.endpoints[] | select(.load_balancer == null) | .ip'— RFC1918.psql "postgresql://...@<private-ip>:5432/erold" -c "SELECT 1;"from container in erold-pn.scw iam api-key list— all 5 app keys exist.SCW_ACCESS_KEY=$(secret get ...) SCW_SECRET_KEY=$(secret get ...) scw object bucket list— IAM scope OK.scw secret list --tags project=erold— all SM secrets exist with enabled revision.docker login rg.fr-par.scw.cloud && docker push rg.fr-par.scw.cloud/erold/api:test-probe.scw container container update <prod-id> --registry-image=...:test-probe && scw container container wait <prod-id>.curl -f https://api.erold.dev/health— 200 + correct SHA.- Same
curl60 s later — confirms min_scale=1 keeps warm. - From container: temporary diagnostic
GET /internal/secret-checkfetcheserold/prod/jwt-signing-keyfrom SM and returns length (remove after validation).
4.9 Verify Against Current Scaleway Reality
Section titled “4.9 Verify Against Current Scaleway Reality”- pgvector on Managed RDB PG17 fr-par —
scw rdb engine list+ extension verification. - Private Network attachment for Serverless Containers GA fr-par —
scw container namespace list -o json | jq '.[0] | keys'forvpc_id/private_network_id. db-play2-picoviability for dev —scw rdb node-type list | jq '.[] | {name, vcpus, memory, available_zones}'.- Secret Manager pricing — confirm vs. assumed ~€0.01/secret/month.
- Container Registry retention policies —
scw registry namespace get. db-gp-xsavailable in fr-par-1 specifically —scw rdb node-type list --region fr-par.scw container container waitactually blocks until healthcheck on new revision — read flag docs; test in staging before relying on it as prod gate.
5. Security Plan & Redaction Filter Spec
Section titled “5. Security Plan & Redaction Filter Spec”5.1 Phase-0 Must-Fix Table
Section titled “5.1 Phase-0 Must-Fix Table”| # | Severity | Finding | Location | Fix | Validation |
|---|---|---|---|---|---|
| P0-1 | Critical (A04) | Bash output shipped without redaction | mcp-server/src/tools/log.ts:51; claude-plugin/scripts/log-file-change.sh:49 | Apply §5.2 filter on every content before POST; drop empty events | Inject AWS key pattern → POST contains [REDACTED:aws_key] |
| P0-2 | Critical (A03) | npx -y @erold/mcp-server@latest = supply-chain RCE | claude-plugin/.mcp.json:5; README.md:15 | Pin exact version + sha256; remove -y (§5.6) | npm pack @erold/mcp-server@1.6.0 --dry-run matches RELEASE.sha256 |
| P0-3 | Critical (A07) | EROLD_API_KEY in .mcp.json + shell history | .mcp.json:7; README | Launcher script reads from Keychain (§5.3) | grep -r 'erold_' ~/.claude/ → 0 |
| P0-4 | High (A01) | Optional projectId scope filter — server may ignore | mcp-server/src/lib/api-client.ts:204; tools/search.ts:38 | Server enforces tenant_id from auth, ignores caller-supplied tenant | Cross-tenant pytest §5.5 passes |
| P0-5 | High (A02) | EROLD_API_URL user-overridable, no validation | mcp-server/src/lib/config.ts:17 | Allowlist [https://api.erold.dev, https://api.staging.erold.dev]; localhost only when NODE_ENV=development | EROLD_API_URL=https://evil → throws |
| P0-6 | Medium (A04) | Bash interpolation builds JSON → injection | log-file-change.sh:49 | jq -n --arg tool "$TOOL_NAME" --arg path "$FILE_PATH" '{...}' | Path foo"; "injected": "true → escaped |
| P0-7 | Medium (A09) | curl errors silently dropped | log-file-change.sh:50; erold-checkpoint.sh:44 | Capture exit code → one-line entry to ~/.erold/error.log | Invalid URL → log entry with ISO-8601 |
| P0-8 | Medium (A04) | File diffs of .env/.pem/SSH keys | hooks/hooks.json:14; log-file-change.sh | Path denylist (§5.2 L1) before any POST | FILE_PATH=~/.ssh/id_ed25519 → no HTTP |
| P0-9 | Low (A03) | Unsigned plugin releases | claude-plugin/ (no release pipeline) | gpg --detach-sign release tarballs; publish .sig | gpg --verify passes in CI |
5.2 Redaction Filter — Concrete Spec
Section titled “5.2 Redaction Filter — Concrete Spec”Pure function redact(text) → RedactResult, applied client-side in MCP and in every hook before any network call. Layers run in order; first match wins per span.
L1 — Path Denylist (drops the entire event when type=file_change matches):
.env .env.* .env.local .env.*.local*.pem *.key *.p12 *.pfx *.jks *.keystore*.crt (in secrets/ or certs/)id_rsa id_rsa.pub id_ed25519 id_ed25519.pub id_ecdsa *.ppk.ssh/ secrets/ secret/.mcp.json .netrc .pgpasskubeconfig *.kubeconfigterraform.tfvars *.tfvars (filename containing "secret" or "key")vault-tokenAlso applied as substring match within observation/decision/error content.
L2 — File-Type Denylist: PKCS12, PEM, x509, PGP keys, octet-stream .key/.bin/.dat. Base64-looking blob >100 chars and >80% base64 alphabet → [REDACTED:binary_blob].
L3 — Entropy Detector: token replaced with [REDACTED:high_entropy] if all of: length ≥20, Shannon ≥4.0 bits/char, ≥1 digit + ≥1 uppercase. Catches base64 128-bit keys (~6.0); spares CamelCase prose (~3.5).
L4 — Pattern Blocklist (case-insensitive where marked):
AKIA[0-9A-Z]{16} → [REDACTED:aws_access_key](?i)aws_secret[_\s=:]+[A-Za-z0-9/+]{40} → [REDACTED:aws_secret_key]SCW[A-Z0-9]{20} → [REDACTED:scw_access_key](?i)scw_secret[_\s=:]+[a-f0-9-]{36} → [REDACTED:scw_secret_key]sk_live_[A-Za-z0-9]{24,} → [REDACTED:stripe_secret_key]rk_live_[A-Za-z0-9]{24,} → [REDACTED:stripe_restricted_key]ghp_[A-Za-z0-9]{36} → [REDACTED:github_pat]github_pat_[A-Za-z0-9_]{82} → [REDACTED:github_pat_fine]sk-ant-[A-Za-z0-9-_]{93} → [REDACTED:anthropic_key]sk-[A-Za-z0-9]{48} → [REDACTED:openai_key]eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+ → [REDACTED:jwt](?i)(password|passwd|pwd)\s*[=:]\s*\S+ → [REDACTED:password_value](?i)(api_?key|apikey)\s*[=:]\s*\S+ → [REDACTED:api_key_value](?i)(secret|token)\s*[=:]\s*\S+ → [REDACTED:secret_value](?i)(access_?key|auth_?token)\s*[=:]\s*\S+ → [REDACTED:auth_value]-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY----- → [REDACTED:private_key_block](?i)(postgres|mysql|mongodb|redis):\/\/[^:]+:[^@]+@ → [REDACTED:dsn_with_credentials]L5 — Known-Format: ENV-style assignments to UUID4 → [REDACTED:uuid_credential]. PEM cert blocks → [REDACTED:certificate_block].
L6 — User-Defined: .eroldignore (gitignore-style at project root or ~/.eroldignore). EROLD_NEVER_CAPTURE env var (colon-separated globs).
Default-deny for new event types: any type not in EVENT_TYPES → 422 before redaction. PR template requires “redaction coverage” checkbox.
5.3 EROLD_API_KEY Lifecycle
Section titled “5.3 EROLD_API_KEY Lifecycle”Create (in app.erold.dev → Settings → API Keys):
echo -n "erold_YOURVALUE" | secret set erold.api-keysecret has erold.api-keyInject — .mcp.json references a launcher, not the key:
{ "mcpServers": { "erold-pm": { "command": "${HOME}/.local/bin/erold-mcp-launch", "args": [] } }}~/.local/bin/erold-mcp-launch:
#!/bin/bashexec env \ EROLD_API_KEY="$(secret get erold.api-key)" \ EROLD_TENANT="$(secret get erold.tenant-id)" \ npx --yes=false @erold/mcp-server@1.6.0 "$@"Key never on disk, never in ps aux (the exec env form keeps it off the command line).
Rotate: generate new key in UI → secret set erold.api-key → revoke old → restart Claude Code. Cadence: ≤90 d, immediate on suspected exposure.
Revoke: revoke in UI; secret rm erold.api-key.
Audit trail: every API key use → server-side api_key_used event (key ID, tenant, ts, source IP, endpoint). Surfaced via /erold-audit?type=api_key_used.
5.4 CredentialReference Contract
Section titled “5.4 CredentialReference Contract”Allowed fields on POST /v1/credentials: name, store (keychain|scaleway-sm|gitea-secrets), purpose, last_used_at, project_id.
Rejected fields (422 CREDENTIAL_VALUE_FORBIDDEN, recursive depth ≤5): value, secret, secret_value, encrypted_value, hash, token, password, key.
FORBIDDEN_KEYS = {"value","secret","secret_value","encrypted_value","hash","token","password","key"}
def check_no_credential_values(body: dict, depth: int = 0): if depth > 5: return for k, v in body.items(): if k.lower() in FORBIDDEN_KEYS: raise HTTPException(422, "CREDENTIAL_VALUE_FORBIDDEN") if isinstance(v, dict): check_no_credential_values(v, depth + 1)5.5 Cross-Tenant Test Requirement
Section titled “5.5 Cross-Tenant Test Requirement”Mandatory pytest case (hard gate, runs on every backend PR):
async def test_tenant_b_cannot_read_tenant_a_fragments(...): marker = "CROSS_TENANT_ISOLATION_MARKER_xK9mP2qR" # log as A await client.post(f"{api}/v1/tenants/tenant-a/events", json={"projectId": project_a_id, "content": marker, "type": "observation"}, headers=tenant_a_headers) await asyncio.sleep(3) # async compression # search as B r = await client.get(f"{api}/v1/tenants/tenant-b/fragments/search", params={"q": marker}, headers=tenant_b_headers) assert r.json()["data"]["total"] == 0Also verify B cannot directly GET tenant-a endpoints with its own token (403).
5.6 Plugin Supply Chain
Section titled “5.6 Plugin Supply Chain”Pinning in .mcp.json:
{ "mcpServers": { "erold-pm": { "command": "${HOME}/.local/bin/erold-mcp-launch", "args": [], "env": { "_EROLD_MCP_VERSION": "1.6.0", "_EROLD_MCP_SHA256": "a3f8c2d1e9b74056f3a2c8d7e1f4b9a0c5e2d8f1a7b3c6e9d2f5a8b1c4e7d0f3" } } }}Launcher verifies sha256 before exec. CI: npm publish --provenance per release; RELEASE.sha256 updated by CI not by hand; latest dist-tag never referenced; manual install verified via npm view @erold/mcp-server@1.6.0 dist.integrity.
5.7 Logging Hygiene Rules
Section titled “5.7 Logging Hygiene Rules”- Auth headers never in error fragments — strip before
formatError. - Credential values never in deploy log fragments —
export [A-Z_]+=lines through full filter. - Path denylist (§5.2 L1) applies to ALL capture paths — no opt-out at call site.
- Error messages to callers contain stable error codes only; raw exception messages →
~/.erold/error.logonly. - Retry payloads are not re-redacted — cached redacted body retried.
- Session boundary events log only project ID + timestamp — never Bash output, prompt text, or env vars.
.eroldignore/EROLD_NEVER_CAPTUREapply BEFORE entropy detector (additive to built-in denylist, not replacement).
5.8 Audit Trail
Section titled “5.8 Audit Trail”Every CredentialReference touch → immutable Event type=credential_ref:
| Field | Value |
|---|---|
type | credential_ref (fixed) |
action | create / touch / rotate / revoke |
credential_ref_id | record ID |
actor | API key ID (never the value) |
tenant_id | tenant performing |
timestamp | server-assigned UTC ISO-8601 |
source_ip | gateway-recorded |
Append-only — no UPDATE or DELETE. Surfaces:
/erold-creds— last 50 events grouped by name; no values./erold-audit— paginated NDJSON export (?from&to&type=credential_ref); SOC 2 / compliance.- Anomaly detection:
touchwithout priorcreate→ integrity violation alert to tenant owner.
Decision log for this plan (must record once approved)
Section titled “Decision log for this plan (must record once approved)”- Backend platform: Postgres+Scaleway vs. continue Firestore. Default: migrate.
intent→taskrename: do at Phase 2 with backward-compat alias.- Local-first SQLite outbox: yes (irreversible-by-inertia if deferred).
- Embedding compute: OpenAI initially; switch to Scaleway H100 if monthly cost > €40.
Top open questions (must resolve before Phase 1)
Section titled “Top open questions (must resolve before Phase 1)”- Backend migration scope: full Firestore→Postgres, or compensating Firestore patterns? (architect risk #2)
- pgvector availability on Scaleway Managed RDB PG17 fr-par — verify before designing schema around it.
- Serverless Container private-network attachment GA in fr-par — required for §4.2 topology.
scw container container waitexact semantics — required for §4.5 prod deploy gate.