Skip to content

Architecture plan

Date: 2026-05-10 Owners: chut@yet.lu Scope: claude-plugin/ + mcp-server/ + the backend behind api.erold.dev. Status: draft for review.


Any new agent in any new Claude Code session must, within its first tool call, receive a structured packet that fully reconstructs working state across many projects: open Tasks with provenance, open Bugs with root-cause and fix narrative, pending and completed Deploys, all Decisions ever made and why, CredentialReferences it can act on, and a ranked “what to do next” list. No dependency on prior agent memory, chat history, or local files. Reliable under offline conditions (local SQLite outbox + retries + idempotency keys), strongly consistent on Tasks/Bugs/Deploys/Decisions, eventually consistent (~1–3 s) on fragment search.

The five sections below are produced by an architect / Claude-Code-features specialist / FastAPI backend specialist / Scaleway infra specialist / security reviewer council. Each section stays in its lane.


Success means any agent, in any new Claude Code session, can call get_context(project_id) within its first tool use and receive a structured packet that fully reconstructs working state: open Tasks with provenance, open Bugs with root-cause and fix narrative, pending Deploys, all Decisions ever made and why, CredentialReferences it can act on, and a ranked “what to do next” list — with no dependency on the prior agent’s memory, chat history, or local files. The system achieves this reliably even when the plugin is offline (outbox), even under concurrent sessions (idempotency + dedup), and without sacrificing write durability for search speed (mixed consistency model).

EntityPurposeKey fieldsLifecycleMutability
ProjectTop-level namespaceid, name, slug, status, tech_stack, created_atactive → archivedMutable (soft fields only)
SessionOne agent invocationid, project_id, agent_fingerprint, started_at, ended_at, summaryopen → closedAppend summary on close; otherwise immutable
TaskUnit of planned workid, project_id, title, description, status (state machine), priority, tags, created_at, completed_at, deleted_atsee state machineSoft-delete; status via transitions only
BugDefect distinct from planned workid, project_id, title, symptom, root_cause, fix_narrative, status (state machine), severity, linked_task_id?, created_at, resolved_at, deleted_atsee state machineroot_cause + fix_narrative set on resolution; soft-delete
DeployRecord of a deploymentid, project_id, env, commit_sha, deployer_session_id, outcome (pending/success/failure), notes, created_atpending → success or failureoutcome + notes settable once; otherwise immutable
DecisionArchitectural or significant technical choiceid, project_id, session_id, title, rationale, alternatives_considered, created_at, superseded_by?active → supersededNever deleted; only superseded
CredentialReferencePointer to a secret, not the secretid, project_id, name, store (keychain / gitea-secrets / scaleway-sm), lookup_key, provision_instructions, last_rotated_atactive → revokedprovision_instructions mutable; values never stored
EventRaw append-only audit logid, project_id, session_id, type, content, client_idempotency_key, dedup_hash, created_atappend-onlyImmutable after write
FragmentSmart-Strip compressed derivative of Eventsid, project_id, source_event_ids[], type, content, embedding_vector, created_atappend-onlyImmutable after write
FileChangeRecord of files touched in a sessionid, project_id, session_id, path, operation (created/modified/deleted), created_atappend-onlyImmutable

Cross-reference matrix:

  • Task → Project (required), Session (created_by), Bug (optional: fixing_bug_id)
  • Bug → Project (required), Session (created_by, resolved_by), Task (optional: linked_task_id)
  • Deploy → Project (required), Session (required), Task[] (optional: closes_task_ids)
  • Decision → Project (required), Session (required), supersedes Decision (optional)
  • CredentialReference → Project (required)
  • Event → Project (required), Session (required), Task? / Bug? / Deploy? / Decision? (optional context links)
  • Fragment → Project (required), Event[] (source)
  • FileChange → Project (required), Session (required)

Task

Allowed transitions only:
todo → in_progress (start)
todo → deleted (delete, sets deleted_at)
in_progress → blocked (block, requires reason)
in_progress → done (complete, requires summary)
in_progress → deleted (delete)
blocked → in_progress (unblock)
blocked → deleted (delete)
done → in_progress (reopen)
* → deleted is soft; deleted_at set, excluded from default queries

Bug

Allowed transitions only:
open → investigating (start_investigation)
open → wont_fix (wont_fix, requires reason)
open → deleted (soft delete)
investigating → resolved (mark_fixed, requires root_cause + fix_narrative — both mandatory)
investigating → wont_fix (wont_fix, requires reason)
resolved → open (reopen)
wont_fix → open (reopen)
Guard: mark_fixed MUST reject if fix_narrative is absent or < 20 chars.
This is the primary continuity guarantee for bug knowledge.
  1. Every Event written by the plugin carries a client-generated UUIDv4 client_idempotency_key set before any network attempt. Server deduplicates on this key within a project-scoped 72-hour window.
  2. Every Event also carries a dedup_hash = SHA-256(project_id + type + content_normalized). Server rejects duplicates within a 30-minute window even if client_idempotency_key differs.
  3. No hook handler makes a synchronous network call. All hooks write to the local SQLite outbox and return immediately. Flush loop runs asynchronously (default 5 s).
  4. Outbox flush is at-least-once with exponential back-off (base 2 s, cap 5 min). On 4xx (client error), the entry moves to a dead-letter table and surfaces to the user, never silently dropped.
  5. State machine transitions are enforced server-side, not just client-side. MCP tool layer checks transitions; API layer enforces them a second time. Invalid transitions return 409.
  6. Decisions and Bug resolutions are never soft-deleted or compressed. Excluded from Smart-Strip pipeline. GET /context always returns all Decisions and all resolved Bugs for a project regardless of age.
  7. CredentialReference.provision_instructions is always returned in the SessionStart packet. Mandatory on create (≥10 chars).
  8. SessionStart context packet is assembled server-side in a single consistent read transaction (Postgres serializable). Fragment/search results are added separately and may lag up to 3 s.
  9. Row-Level Security enforced at the database layer. App-layer is a second check; DB is the backstop.
  10. All outbox entries include a session_id bound at plugin init. Orphaned outbox entries flush on next session start under the original session_id, then the old session is closed server-side.
  11. FileChange events are emitted by the PostToolUse hook, not inferred. The plugin must not diff git state.
  12. The MCP server exposes no tool that directly executes shell commands against user infrastructure except infrastructure_status (read-only, hard timeout).

What a fresh agent receives at get_context(project_id):

SectionContentsGated by
projectname, slug, status, tech_stackProject exists
open_tasksAll tasks not in done/deleted, with title, status, priority, summaryTasks were created via task_create tool or plugin
open_bugsAll bugs in open/investigating, with symptomBugs were logged as Bug entities, not as generic Events
resolved_bugsAll resolved bugs with root_cause + fix_narrativemark_fixed was called with both fields
pending_deploysAll Deploys with outcome=pendingdeploy_log was called before deploy started
deploy_historyLast 5 completed Deploys with outcome + notesdeploy_log was called and outcome was updated
decisionsAll Decisions, never truncateddecision_log was called explicitly; passive Events do not produce Decisions
credential_refsAll CredentialReferences with name, store, lookup_key, provision_instructionscredential_ref_upsert was called; credentials only mentioned in chat are invisible
what_to_do_nextRanked list: open high-priority tasks + open bugs by severityStructured entities exist; cannot be inferred from fragments alone
recent_fragmentsTop-20 semantically relevant fragmentsEvents were logged; Smart-Strip ran (eventual, ~1-3 s lag)

Explicit gaps from passive capture alone: if an agent never calls decision_log, no Decision exists — only an Event fragment that may or may not surface in search. If mark_fixed is called without fix_narrative, the server rejects the transition. If credentials are mentioned in chat but never registered via credential_ref_upsert, the next agent has no structured pointer to them. These gaps are intentional: passive capture gives eventually-consistent searchable narrative; structured entities give strongly-consistent resumable state. Both are required for full continuity.

Phase 0 — Security Blockers (prerequisite; blocks all other phases) Scope: redaction filter, MCP version pinning, EROLD_API_KEY → Keychain, EROLD_API_URL allowlist, jq-based JSON construction, path denylist enforced in hooks, CORS audit, input validation on every write path, key rotation path verified. Exit criteria: security-reviewer sign-off; no OWASP critical/high findings open; all write endpoints reject malformed payloads with 400 not 500; redaction filter integration tests green. Dependencies: none.

Phase 1 — Shared Types + Local Outbox Scope: @erold/shared-types package containing the full domain model + state machine transition types + SessionStart packet interface. SQLite outbox in claude-plugin with at-least-once flush, exponential back-off, dead-letter table, client_idempotency_key generation. No backend changes — outbox flushes to existing /events endpoint. Exit criteria: plugin can go offline 10 min and all Events flush correctly on reconnect with no duplicates (verified against real dev DB, no mocks); shared-types imports cleanly in both plugin and mcp-server. Dependencies: Phase 0.

Phase 2 — Structured Entity API (Task, Bug, Decision, Deploy, CredentialRef) Scope: backend endpoints for the 5 structured entities with state-machine enforcement, soft-delete, mandatory-field guards (fix_narrative on Bug resolution, rationale on Decision, provision_instructions on CredentialRef). Migrate intenttask (backward-compat alias). New MCP tools: decision_log, bug_report, bug_resolve, deploy_log, credential_ref_upsert. Exit criteria: all transitions enforced at API (invalid → 409); get_context returns all 5 sections populated with real data; old intent tool still works via alias. Dependencies: Phase 1.

Phase 3 — Reliable SessionStart Packet Scope: server-side consistent-read assembly of the SessionStart packet (structured entities in one transaction; fragments appended separately). session_id binding in plugin init + orphaned-session flush on next start. Wire get_context to new packet shape. Server-side what_to_do_next ranking. Exit criteria: cold-start agent in a new session calls get_context and can immediately continue; round-trip < 2 s for a project with 50 tasks / 20 bugs / 100 decisions. Dependencies: Phase 2.

Phase 4 — Idempotency + Dedup Hardening Scope: server-side dedup on client_idempotency_key (72 h) and dedup_hash (30 min). RLS at DB layer. Dead-letter surfacing to MCP tool response. Exit criteria: replay of 1000 identical Events with same idempotency key produces exactly 1 stored Event; cross-tenant integration test returns 0 rows. Dependencies: Phase 3.

Phase 5 — Hook Coverage + FileChange Capture Scope: all 8 hooks (PreToolUse, PostToolUse(Edit|Write|Bash), SessionStart, UserPromptSubmit, SubagentStop, PreCompact, SessionEnd, Notification). Each → outbox, never direct HTTP. infrastructure_status guarded read-only. Exit criteria: a 20-tool-call session produces correct FileChange records; zero synchronous network calls in any hook path; hook return latency < 5 ms. Dependencies: Phase 4.

Phase 6 — Smart-Strip + Fragment Search Scope: Smart-Strip compaction pipeline server-side. Wire search MCP tool to vector index. Eventual-consistency acknowledgement in get_context. Tune fragment TTL + compaction frequency. Exit criteria: a Decision logged 2 s ago is retrievable via search("why did we choose X") with relevance > 0.7; structured Decisions returned immediately, fragments within 3 s. Dependencies: Phase 5.

Phase 7 — Production deploy via Gitea CI Scope: Gitea CI workflows (PR / staging / prod tag), Scaleway resources provisioned, all secrets in Secret Manager / Gitea Secrets / Keychain per the matrix in §4.4, validation checklist passed end-to-end. Exit criteria: git tag v1.0.0 && git push --tags deploys prod with zero manual steps; all 13 validation-checklist commands pass. Dependencies: Phase 6.

  1. Passive capture gives false confidence without explicit structured logging. Mitigation: get_context response explicitly signals empty structured sections with prompts (“no decisions logged — use decision_log”); CLAUDE.md plugin instruction makes explicit logging first-class. Acceptance test: a session with zero explicit logs surfaces the gap warning, not silence.
  2. Backend is currently Firebase, council prescribed Postgres+Scaleway. Mitigation: Phase 2 must decide migrate vs. compensating Firestore patterns (per-tenant collection scoping + Cloud Functions enforcement) — log as a Decision before Phase 2 starts.
  3. Outbox SQLite contention under concurrent sessions on same machine. Mitigation: WAL mode + per-session outbox table keyed by session_id; flush loop uses advisory lock. Phase 1 includes a concurrent two-session stress test.
  4. fix_narrative mandatory guard breaks existing error-typed Events. Mitigation: do not migrate historical Events to Bugs; let them remain searchable fragments. Bug entity starts fresh from Phase 2. No backfill script (hallucination risk).
  5. CredentialReference.provision_instructions could leak values. Mitigation: server-side regex scan (erold_, sk-, AKIA, hex>32) at write time; reject with 400. Phase 0/2 boundary scope.

EventTriggerData Captured (JSON)Entity TypeRedaction StepOutbox Destination
SessionStartAgent starts; project detected{ session_id, agent_id, project_root, detected_markers[], cwd, timestamp_iso, claude_version, plugin_version, parent_session_id?, fork_reason? }SessionOpenedVerify project_root exists; redact parent_session_id if sensitive~/.erold/outbox/session_meta.jsonl
PreToolUse(Bash) (NEW)Before Bash tool invocation{ session_id, tool_id, invocation_index, command_hash (SHA256), args_count, target_services[], timestamp_iso }CommandExecutionIntentHash command; redact secret-pattern args; log service targets only~/.erold/outbox/intent.jsonl
PostToolUse(Edit)After Edit completes{ session_id, tool_id, file_path_hash, diff_lines_count, redaction_applied_y_n, conflict_detected_y_n, timestamp_iso }FileModifiedNever log file contents; hash path; line-count delta only~/.erold/outbox/mutation.jsonl
PostToolUse(Write)After Write completes{ session_id, tool_id, file_path_hash, size_bytes, file_type_detected, redaction_applied_y_n, timestamp_iso }FileCreatedHash path; redact if matches patterns; log type+size only~/.erold/outbox/mutation.jsonl
PostToolUse(Bash) (NEW)After Bash exits{ session_id, tool_id, invocation_index, command_hash, exit_code, exit_signal?, stdout_line_count, stderr_line_count, timed_out_y_n, captured_output_hash?, duration_ms, timestamp_iso }CommandExecutedHash stdout/stderr; redact patterns; log hashes + metadata only~/.erold/outbox/execution.jsonl
UserPromptSubmit (NEW)User submits a prompt{ session_id, prompt_hash, word_count, detected_intent, tool_use_anticipated_y_n, timestamp_iso }PromptReceivedHash prompt; log classifier output only~/.erold/outbox/prompt.jsonl
PreCompact (NEW)Before context compaction{ session_id, memory_items_count, unique_by_hash_count, redacted_count, dedup_decision_summary, timestamp_iso }CompactionIntentVerify dedup hash correctness; counts only~/.erold/outbox/compaction.jsonl
SessionEnd (NEW)Agent exits or stopped{ session_id, exit_reason, total_duration_ms, final_intent_state, unresolved_bugs_count, pending_deploys_count, timestamp_iso }SessionClosedNone~/.erold/outbox/session_meta.jsonl

All outbox writes are async fire-and-forget; a daemon flushes to SQLite ≤ 5 s later. All timestamps ISO-8601 UTC. Service targets inferred from command (scw, kubectl, hcloud, tea, git, docker).

NameTypeAllowed ToolsInvoked WhenReturns
contextAutonomousNone (read-only MCP)After UserPromptSubmit; decides if context refresh needed{ refresh_decision, reason, cached_until_ts }
guidelinesAutonomousNone (read-only MCP)SessionStart + intent change{ applicable_rules[], tier, override_flags[] }
memory-dedupAutonomousbatch_record_events, search_memoriesPreCompact fires{ dedup_count, items_merged, new_hash_index }
intent-detectorAutonomoussearch_memories, get_deployment_logUserPromptSubmit fires{ intent, confidence, suggested_agent?, reasoning }
/erold-statusUsersession_snapshot, get_bugs, get_deployment_logSlash commandFormatted table or JSON
/erold-resumeUserall read MCPSlash command after fork/reconnectResume packet (§2.7)
/erold-auditUserexport_audit_trailSlash commandCSV + plaintext, anonymized paths
/erold-decisionUserrecord_decision, get_decision_logSlash commandConfirmation: stored decision ID + timestamp

Autonomous skills run silently post-hook; failures logged but never block. User skills error gracefully outside a session.

TriggerTypeInput SchemaOutput Shape
/erold-statusRead-only`{ format?: “text""json”, quiet?: bool }`
/erold-resumeRead-only{ session_id?, include?, export_secrets?: bool }Resume packet (§2.7)
/erold-bugsRead-only`{ filter?: “open""fixed"
/erold-deploy-logRead-only{ limit?: int, env?, days?: int }Deploy[]
/erold-credsRead-only{ list_touched?: bool, validate?: bool }CredRef[] (names only, never values)
/erold-decisionWrite{ text, tag?, context_snapshot?: bool }{ decision_id, recorded_at_ts, acknowledged }
/erold-reset-intentWrite{ reason?, new_intent? }{ intent_reset_at_ts, prior_intent, new_intent }

Idempotency key pattern: {project_root}#{action}#{dedup_hash}#{timestamp_bucket(ts/10)}.

ToolInputOutputIdempotentBatchSide Effects
session_snapshot{ session_id }{ session, events[], credentials_touched[], decisions[], bugs_found[], deploys_linked[] }YNNone
search_memories{ project_root, query, limit?, dedup?, since_ts? }{ items[], total_count, dedup_merge_report? }YY (100)None
get_credentials_touched{ session_id, redact?: bool }{ credential_names[], usage_summary }YNNone
get_deployment_log{ env?, days?, limit?, service? }{ deploys[], total_before_limit }YNNone
get_bugs{ status?, project?, days? }{ bugs[], total_count }YNNone
record_decision{ text, tag?, session_id, context_snapshot?, idempotency_key }{ decision_id, recorded_at_ts, acked }Y (by key)NAppend decision_log
mark_intent_resolved{ intent_key, resolution, idempotency_key }{ resolved_at_ts, prior_state, acked }YNUpdates intent table; fires Notification
tag_session{ session_id, tags[], idempotency_key }{ session_id, applied_tags[], new_tag_count }YYUpdates session_meta
batch_record_events{ events[], idempotency_key }{ recorded_count, failed_count, errors? }YY (1000)Bulk-insert
export_audit_trail{ project_root?, days?, format, redact_paths?, redact_commands?, redact_outputs? }{ data, exported_at_ts, event_count, summary }YNNone
erold: [intent: ACTIVE_INTENT] [bugs: N_OPEN] [deploy: LAST_OUTCOME (AGE_MIN)] [checkpoint: AGE_MIN]

Example: erold: [intent: gateway-jwt-rotate] [bugs: 2] [deploy: ✓ 15m] [checkpoint: 8m]. Refresh every 30 s or on hook fire. Offline fallback: erold: [offline].

{
"erold": {
"enabled": true,
"capture_bash_output": true,
"redact_secrets_patterns": ["api[_-]?key","secret[_-]?key","token","password","api_secret","access[_-]?key","private[_-]?key","credential"],
"project_root_detection": {
"markers": [".git","CLAUDE.md",".claude/","package.json","pyproject.toml"],
"skip_if_no_marker": true
},
"memory_retention_days": 90,
"auto_checkpoint_on_subagent_end": true,
"statusline_enabled": true,
"statusline_refresh_hz": 0.033,
"redact_credentials_in_cli_output": true,
"database_path": "~/.erold/db.sqlite3",
"outbox_path": "~/.erold/outbox/",
"outbox_flush_interval_ms": 5000
}
}

Validation: memory_retention_days >= 7; statusline_refresh_hz 0.01–1.0; markers non-empty; database_path and outbox_path writable (checked at SessionStart).

{
"resume_packet_v": "1.0",
"generated_at_ts": 1715349600,
"source_session_id": "sess_abc123def456",
"project_meta": { "root", "detected_markers", "project_name", "agent_environment" },
"active_tasks": [{ "intent_key", "intent_type", "status", "created_at_ts", "context_summary", "estimated_remaining_work" }],
"open_bugs": [{ "id", "title", "status", "severity", "detected_session_id", "created_ts", "context" }],
"recent_deploys": [{ "id", "timestamp", "env", "service", "command_hash", "exit_code", "duration_ms", "rolled_back", "linked_bugs", "output_summary" }],
"recent_decisions": [{ "id", "recorded_at_ts", "tag", "text", "session_id" }],
"credential_refs": [{ "name", "first_used_ts", "last_used_ts", "tool_context", "confirmed_in_keychain" }],
"unresolved_pre_compact_items": [],
"next_action_hint": "string",
"fragment_summary": { "total_events_session", "unique_commands", "unique_files_touched", "total_decisions_logged", "open_intent_count" }
}

Data sources: project_metasession_snapshot.session; active_taskssearch_memories(intent:*); open_bugsget_bugs(status=open); recent_deploysget_deployment_log(days=7,limit=3); recent_decisionssearch_memories(decision:*,limit=5); credential_refsget_credentials_touched; unresolved_pre_compact_items ← SQLite (compaction_status='pending'); fragment_summary ← aggregate.

  1. No file contents in logs — paths (hashed), line counts, file type, size only.
  2. Redaction before outbox write — never after.
  3. Dedup order: exact → MinHash LSH → semantic.
  4. Hooks are never synchronous — local JSONL outbox only.
  5. Credential names only, never values.
  6. MCP tool calls in hooks are idempotent + fast-fallback (>1 s → log error and continue).
  7. Dedup at (project_root, command_hash, exit_code, ts/10).
  8. Project-root detection skips on no marker (silent bypass).
  9. Session state is immutable after SessionEnd fires.
  10. Outbox JSONL is append-only.
  11. settings.json validated against schema at SessionStart; invalid → hard error.
  12. Credentials never generated on-the-fly; only from Keychain via secret get.

TenantScoped (base): tenant_id: UUID, created_at: datetime, updated_at: datetime. Config: from_attributes=True, str_strip_whitespace=True, validate_assignment=True.

IdempotentEventBase (extends TenantScoped): client_idempotency_key: UUID (client v4), server_received_at: datetime (server-stamped), project_id: UUID, session_id: UUID.

Project: id, name (1-120), slug (1-60, ^[a-z0-9-]+$), description (≤2000), status (active|archived), metadata (≤50 keys).

Session: id, project_id, agent_id (≤128), started_at, ended_at?, status (active|ended|abandoned), context_snapshot_id?.

Event (extends IdempotentEventBase): id, event_type (tool_call|tool_result|thinking|assistant_message|user_message|session_start|session_end|error), payload (≤65536 bytes), sequence_index (≥0), parent_event_id?, outbox_id?.

Fragment: id, project_id, content (1-8192), content_hash (SHA256 of tenant_id:project_id:content[:512], computed in model_validator(mode='before')), source_event_ids[], fragment_type (insight|decision_ref|task_ref|summary|code_ref), embedding_status (pending|embedded|failed), embedding (vector, populated by worker).

Task: id, project_id, title (1-256), description (≤4096), status (open|in_progress|blocked|done|cancelled), priority (low|medium|high|critical), source_event_id?, resolved_at?, resolution_note (≤2048).

Bug: id, project_id, title, description, status (open|investigating|resolved|wont_fix), severity, resolution_text (≤4096), resolved_at?, source_event_id?.

Deploy: id, project_id, version (≤128), environment (dev|staging|prod), status (started|succeeded|failed|rolled_back), deployed_at, notes (≤2048).

Decision: id, project_id, title, rationale (1-8192), decided_at, decided_by (≤128), source_event_id?.

CredentialReference: id, project_id, name (1-128), description (≤512), credential_type (api_key|oauth_token|cert|password|other), vault_reference (≤512). Validator rejects high-entropy strings >32 chars or known prefixes (sk-, ghp_).

FileChange (extends IdempotentEventBase): id, path (1-4096), change_type (created|modified|deleted|renamed), old_path? (required when renamed), diff_summary (≤4096, human-readable), source_event_id.

MethodPathPurposeBodyResponseIdempotentRLS Scope
POST/v1/sessionsStart sessionSessionSessionNtenant
PATCH/v1/sessions/{id}End/update{status, ended_at}SessionYtenant+project
GET/v1/projectsListProject[]Ytenant
POST/v1/projectsCreateProjectProjectNtenant
GET/v1/projects/{id}GetProjectYtenant
GET/v1/projects/{id}/contextContinuity packetContextPacketYtenant+project
POST/v1/events/batchIngestBatchIngest{events[],file_changes[]}BatchAck{received,duplicate_keys[]}Ytenant+project
GET/v1/fragments/searchSemantic searchqueryFragment[] (no embedding field)Ytenant+project
GET/POST/PATCH/v1/tasksCRUDTask / Task[]partialtenant+project
GET/POST/PATCH/v1/bugsCRUDBug / Bug[]partialtenant+project
GET/POST/v1/deploysList/recordDeploy / Deploy[]partialtenant+project
GET/POST/v1/decisionsList/recordDecision / Decision[]partialtenant+project
GET/POST/v1/credentialsList/registerCredentialReference / []partialtenant+project

ContextPacket (response of GET /v1/projects/{id}/context):

  • project: Project
  • active_tasks: Task[] (status open|in_progress|blocked)
  • open_bugs: Bug[] (status open|investigating, top 20 by severity)
  • recently_resolved_bugs: Bug[] (resolved last 72 h, includes resolution_text)
  • recent_deploys: Deploy[] (last 5 per env)
  • recent_decisions: Decision[] (last 10 by decided_at)
  • credential_refs: CredentialReference[] (names + vault_reference only)
  • unresolved_pre_compact_fragments: Fragment[] (type summary, embedding_status != embedded, ≤10)
  • fragment_summary: str | None
  • generated_at: datetime
  • tenants: id, name, created_at.
  • api_keys: id, tenant_id, key_hash (SHA256, never plaintext), project_ids[] (empty = tenant-wide), revoked_at, created_at; UNIQUE(key_hash).
  • projects: id, tenant_id, name, slug, status, description, metadata jsonb, timestamps; UNIQUE(tenant_id, slug); RLS enabled.
  • sessions: id, tenant_id, project_id, agent_id, status, timestamps, context_snapshot_id; RLS.
  • events: id, tenant_id, project_id, session_id, client_idempotency_key, server_received_at, event_type, payload jsonb, sequence_index, parent_event_id, created_at; UNIQUE(tenant_id, client_idempotency_key); RLS.
  • events_outbox: id, tenant_id, event_id, status (pending|processing|done|dead), attempts, next_retry_at, processed_at, error_detail; partial index on (status, next_retry_at) WHERE status IN ('pending','processing').
  • fragments: id, tenant_id, project_id, content, content_hash (SHA256, UNIQUE(tenant_id, content_hash)), source_event_ids[], fragment_type, embedding_status, embedding vector(1536) (pgvector), HNSW index after embedding_status='embedded'; RLS.
  • embedding_queue: id, fragment_id, tenant_id, status, attempts, next_retry_at, partial index.
  • tasks / bugs / deploys / decisions: same template — id, tenant_id, project_id, entity-specific cols, timestamps, RLS.
  • credential_references: + vault_reference VARCHAR(512) NOT NULL, UNIQUE(tenant_id, project_id, name).
  • file_changes: + UNIQUE(tenant_id, client_idempotency_key).

001 init_tenants_api_keys (pgcrypto) → 002 projects_sessions (RLS + app.tenant_id) → 003 events_outbox (UNIQUE idem key) → 004 fragments_pgvector (CREATE EXTENSION vector) → 005 fragments_hnsw_index (CREATE INDEX CONCURRENTLY, autocommit) → 006 embedding_queue → 007 tasks_bugs → 008 deploys_decisions → 009 credentials_file_changes → 010 rls_role_setup (app_user role + RLS policy fire-test in transaction).

  1. Auth → resolve (tenant_id, allowed_project_ids) from API key hash.
  2. Pydantic validation per event (422 per-item on violations; valid items proceed).
  3. Single asyncpg transaction: bulk insert into events ON CONFLICT (tenant_id, client_idempotency_key) DO NOTHING; capture duplicate keys; insert one row per new event into events_outbox.
  4. Respond 202 with {received, duplicate_keys[]}.
  5. Outbox worker: SELECT … FOR UPDATE SKIP LOCKED batch 50; exponential backoff on failure; cap 5 attempts → dead.
  6. Worker applies Smart-Strip; upsert fragments ON CONFLICT (tenant_id, content_hash) DO NOTHING; mark outbox done.
  7. Inserted fragments enqueued in embedding_queue.
  8. Embedding worker: FOR UPDATE SKIP LOCKED; calls embedding API async; writes embedding; marks embedded.

SLOs: p50 ingest-to-outbox <50 ms; p99 <200 ms; p50 outbox-to-fragment <2 s; p99 fragment search lag <10 s; p90 embedding lag <5 s.

Every route declares ctx: TenantContext = Depends(get_tenant_context). get_tenant_context extracts Authorization: Bearer, SHA256-hashes, queries api_keys (auth_user role bypassing RLS), returns context. Project-scoped keys check project_id ∈ allowed_project_ids. All repository methods receive ctx; queries always include WHERE tenant_id = $1 as first predicate. Connection middleware does SET LOCAL app.tenant_id = '<uuid>'. No route accepts tenant_id from request.

Per CI run: ephemeral db-dev-s via Terraform; per-test schema (CREATE SCHEMA t_<uuid> / DROP SCHEMA CASCADE).

Required cases:

  • Cross-tenant leak guard — 2 tenants, query A→0 from B; raw SELECT without SET LOCAL raises policy violation.
  • Outbox replay idempotency — same batch twice, identical idem keys → N events, not 2N; BatchAck.duplicate_keys populated on second call.
  • Embedding async lag — insert 5 events, poll until embedded, assert <5 s.
  • RLS direct DB block — raw asyncpg without SET LOCAL → 0 rows.
  • Context packet completeness — seed 2 tasks/1 resolved bug/2 deploys/1 decision/1 credref; assert all sections + no raw secret.
  • Fragment dedup via content hash — 2 events, different idem keys, same content → exactly 1 fragment.
  1. No inline embeddings in route handlers — only the embedding worker.
  2. No raw Event objects in API responses — fragments only.
  3. No query without explicit WHERE tenant_id = $1 (RLS is belt+suspenders).
  4. No schema changes outside Alembic.
  5. No synchronous I/O in route handlers; CPU-bound via asyncio.to_thread.

TypeNameRegionSizingTagsPrivate NetworkVerify
Container Registry namespaceeroldfr-parn/aproject=eroldn/ascw registry namespace list
Serverless Container namespaceerold-apifr-parn/aproject=eroldn/ascw container namespace list
Serverless Containererold-api-prodfr-parmem=1024 cpu=560 min=1 max=5env=proderold-pnconfirm Private Network on tier
Serverless Containererold-api-stagingfr-parmem=512 cpu=280 min=0 max=2env=stagingerold-pnsame
Managed RDB (dev)erold-db-devfr-par-1db-dev-s PG17env=deverold-pnCREATE EXTENSION vector on scratch first
Managed RDB (prod)erold-db-prodfr-par-1db-gp-xs PG17 HAenv=proderold-pnsame
Object Storageerold-raw-eventsfr-parStandardenv=prodn/ascw object bucket list
Object Storageerold-deploy-logsfr-parStandardenv=prodn/asame
Object Storageerold-attachmentsfr-parStandardenv=prodn/asame
VPC Private Networkerold-pnfr-parn/aproject=eroldselfscw vpc private-network list

pgvector probe (must succeed before committing):

psql $SCRATCH_DB_URL -c "CREATE EXTENSION IF NOT EXISTS vector; SELECT extversion FROM pg_extension WHERE extname='vector';"
Internet → api.erold.dev (HTTPS 443)
[Serverless Container: erold-api-prod]
↓ Private Network (erold-pn / RFC1918)
├──→ [RDB erold-db-prod] :5432 (private endpoint only)
└──→ [Object Storage] s3.fr-par.scw.cloud (internal, no egress fee)

Container security group: inbound 8000/TCP from 0.0.0.0/0; outbound 5432 to erold-pn CIDR; outbound 443 (Object Storage, Secret Manager, OpenAI, registry). RDB prod: --is-public-ip-enabled=false; verify endpoint IP is RFC1918.

NamePurposePolicies + ScopeKey LocationRotation
erold-apiFastAPI runtimeSecretManagerReadOnly (project=erold); ObjectStorageRW on erold-raw-events,erold-attachments; RDBReadWrite on erold-db-prodScaleway SM (erold/prod/scw-api-{access,secret}-key); injected as env at container start90 d via CI rotation job
erold-mcp-pluginMCP plugin (EROLD_API_KEY bearer)none (app-level key, not SCW)Keychain erold.{env}.api-key90 d
erold-ciGitea CI build/push/updateRegistryRW on erold ns; ContainersWrite on erold-api nsGitea Secrets SCW_CI_{ACCESS,SECRET}_KEY90 d
erold-backupScheduled DB dumpsRDBReadOnly; ObjectStorageRW on erold-deploy-logsKeychain erold.prod.backup-scw-{access,secret}-key (operator)90 d
erold-secret-readerBootstrap SM readSecretManagerReadOnly (project=erold)Keychain erold.prod.secret-reader-access-key (or merge with erold-api)90 d

Note: if erold-api has SecretManagerReadOnly, erold-secret-reader is redundant — confirm at design review.

NameStoreReadsRotatesCadence
erold.dev.api-keyKeychaindev .envrc, MCP plugindev90 d
erold.prod.api-keyKeychain + Gitea Secret EROLD_API_KEYprod MCP plugin, smoke testsdev + CI90 d
erold.{env}.tenantKeychain + Gitea Secret EROLD_TENANTdev/CIdevon change
erold.dev.scw.{access,secret}-keyKeychaindev .envrcscwdev90 d
erold.prod.scw.{access,secret}-key (erold-api)Scaleway SM erold/prod/scw-api-*container at runtimeerold-ci90 d
SCW_CI_{ACCESS,SECRET}_KEYGitea SecretsCI runnerdev (tea repos secrets create)90 d
erold/{env}/database-urlScaleway SM (prod) / Keychain (dev)containerdev + DBAon rotation
erold/prod/jwt-signing-keyScaleway SMcontainerautomated job90 d
erold/prod/openai-api-keyScaleway SMcontainerdev90 d or compromise

Pattern for SM secrets: erold/{env}/{purpose} + latest-enabled revision.

.gitea/workflows/:

pr.yml (PR open/push): checkout → ruff check+format → pyright strict → ephemeral RDB (tag env=test owner=ci-$RUN_ID) → pytest tests/integration/ → destroy RDB.

staging.yml (push to main): checkout → docker login rg.fr-par.scw.cloud → build amd64 → push → scw container update <staging-id> --registry-image=...:staging-$SHAscw container waitcurl -f https://staging-api.erold.dev/health (expect SHA echo).

deploy.yml (tag v*): same pattern with :$TAG; scw container container wait blocks until healthcheck passes; smoke test asserts SHA matches tag; on failure → Gitea notify; manual rollback (no auto).

scw container container update --wait requires GET /health returning 200 with {"sha": "..."} (baked via ARG GIT_SHA → ENV APP_SHA).

.envrc (committed, no values):

Terminal window
export SCW_ACCESS_KEY="$(secret get erold.dev.scw.access-key)"
export SCW_SECRET_KEY="$(secret get erold.dev.scw.secret-key)"
export SCW_DEFAULT_PROJECT_ID="$(secret get scw.prod.project-id)"
export DATABASE_URL="$(secret get erold.dev.database-url)"
export JWT_SIGNING_KEY="$(secret get erold.dev.jwt-signing-key)"
export EROLD_API_KEY="$(secret get erold.dev.api-key)"
export EROLD_TENANT="$(secret get erold.dev.tenant)"
export OPENAI_API_KEY="$(secret get erold.dev.openai-api-key)"

One-time per machine: secret set erold.dev.* for each of the 7 names; direnv allow. direnv hook zsh in ~/.zshrc.

Line ItemDevProdScale-Up TriggerNext
Serverless Container~€0~€14p95 latency >2s or CPU >80%max=10, mem=2048
Managed RDB~€6 db-dev-s~€25 db-gp-xsDB CPU >70% sustaineddb-gp-s ~€50
Container Registry~€0~€1tag count >20enforce retention
Object Storage~€0.5~€2attachments >100GBlifecycle expiry on raw-events 30d
Secret Manager~€0~€1>100 secretsper-secret/month
Embedding computevariablevariableOpenAI cost > €40/moScaleway-hosted H100
Private Network~€0~€0n/an/a
Total~€7~€43

★ RDB and embedding compute scale fastest with usage.

4.8 Validation Checklist (must run before Phase-7 deploy)

Section titled “4.8 Validation Checklist (must run before Phase-7 deploy)”
  1. scw rdb instance list — both instances ready.
  2. psql $DATABASE_URL -c "CREATE EXTENSION IF NOT EXISTS vector; SELECT extversion FROM pg_extension WHERE extname='vector';"block all vector code if this fails.
  3. scw vpc private-network get <erold-pn-id> — subnets assigned.
  4. scw rdb instance get <prod-id> | jq '.endpoints[] | select(.load_balancer == null) | .ip' — RFC1918.
  5. psql "postgresql://...@<private-ip>:5432/erold" -c "SELECT 1;" from container in erold-pn.
  6. scw iam api-key list — all 5 app keys exist.
  7. SCW_ACCESS_KEY=$(secret get ...) SCW_SECRET_KEY=$(secret get ...) scw object bucket list — IAM scope OK.
  8. scw secret list --tags project=erold — all SM secrets exist with enabled revision.
  9. docker login rg.fr-par.scw.cloud && docker push rg.fr-par.scw.cloud/erold/api:test-probe.
  10. scw container container update <prod-id> --registry-image=...:test-probe && scw container container wait <prod-id>.
  11. curl -f https://api.erold.dev/health — 200 + correct SHA.
  12. Same curl 60 s later — confirms min_scale=1 keeps warm.
  13. From container: temporary diagnostic GET /internal/secret-check fetches erold/prod/jwt-signing-key from SM and returns length (remove after validation).

4.9 Verify Against Current Scaleway Reality

Section titled “4.9 Verify Against Current Scaleway Reality”
  1. pgvector on Managed RDB PG17 fr-parscw rdb engine list + extension verification.
  2. Private Network attachment for Serverless Containers GA fr-parscw container namespace list -o json | jq '.[0] | keys' for vpc_id/private_network_id.
  3. db-play2-pico viability for devscw rdb node-type list | jq '.[] | {name, vcpus, memory, available_zones}'.
  4. Secret Manager pricing — confirm vs. assumed ~€0.01/secret/month.
  5. Container Registry retention policiesscw registry namespace get.
  6. db-gp-xs available in fr-par-1 specificallyscw rdb node-type list --region fr-par.
  7. scw container container wait actually blocks until healthcheck on new revision — read flag docs; test in staging before relying on it as prod gate.

#SeverityFindingLocationFixValidation
P0-1Critical (A04)Bash output shipped without redactionmcp-server/src/tools/log.ts:51; claude-plugin/scripts/log-file-change.sh:49Apply §5.2 filter on every content before POST; drop empty eventsInject AWS key pattern → POST contains [REDACTED:aws_key]
P0-2Critical (A03)npx -y @erold/mcp-server@latest = supply-chain RCEclaude-plugin/.mcp.json:5; README.md:15Pin exact version + sha256; remove -y (§5.6)npm pack @erold/mcp-server@1.6.0 --dry-run matches RELEASE.sha256
P0-3Critical (A07)EROLD_API_KEY in .mcp.json + shell history.mcp.json:7; READMELauncher script reads from Keychain (§5.3)grep -r 'erold_' ~/.claude/ → 0
P0-4High (A01)Optional projectId scope filter — server may ignoremcp-server/src/lib/api-client.ts:204; tools/search.ts:38Server enforces tenant_id from auth, ignores caller-supplied tenantCross-tenant pytest §5.5 passes
P0-5High (A02)EROLD_API_URL user-overridable, no validationmcp-server/src/lib/config.ts:17Allowlist [https://api.erold.dev, https://api.staging.erold.dev]; localhost only when NODE_ENV=developmentEROLD_API_URL=https://evil → throws
P0-6Medium (A04)Bash interpolation builds JSON → injectionlog-file-change.sh:49jq -n --arg tool "$TOOL_NAME" --arg path "$FILE_PATH" '{...}'Path foo"; "injected": "true → escaped
P0-7Medium (A09)curl errors silently droppedlog-file-change.sh:50; erold-checkpoint.sh:44Capture exit code → one-line entry to ~/.erold/error.logInvalid URL → log entry with ISO-8601
P0-8Medium (A04)File diffs of .env/.pem/SSH keyshooks/hooks.json:14; log-file-change.shPath denylist (§5.2 L1) before any POSTFILE_PATH=~/.ssh/id_ed25519 → no HTTP
P0-9Low (A03)Unsigned plugin releasesclaude-plugin/ (no release pipeline)gpg --detach-sign release tarballs; publish .siggpg --verify passes in CI

Pure function redact(text) → RedactResult, applied client-side in MCP and in every hook before any network call. Layers run in order; first match wins per span.

L1 — Path Denylist (drops the entire event when type=file_change matches):

.env .env.* .env.local .env.*.local
*.pem *.key *.p12 *.pfx *.jks *.keystore
*.crt (in secrets/ or certs/)
id_rsa id_rsa.pub id_ed25519 id_ed25519.pub id_ecdsa *.ppk
.ssh/ secrets/ secret/
.mcp.json .netrc .pgpass
kubeconfig *.kubeconfig
terraform.tfvars *.tfvars (filename containing "secret" or "key")
vault-token

Also applied as substring match within observation/decision/error content.

L2 — File-Type Denylist: PKCS12, PEM, x509, PGP keys, octet-stream .key/.bin/.dat. Base64-looking blob >100 chars and >80% base64 alphabet → [REDACTED:binary_blob].

L3 — Entropy Detector: token replaced with [REDACTED:high_entropy] if all of: length ≥20, Shannon ≥4.0 bits/char, ≥1 digit + ≥1 uppercase. Catches base64 128-bit keys (~6.0); spares CamelCase prose (~3.5).

L4 — Pattern Blocklist (case-insensitive where marked):

AKIA[0-9A-Z]{16} → [REDACTED:aws_access_key]
(?i)aws_secret[_\s=:]+[A-Za-z0-9/+]{40} → [REDACTED:aws_secret_key]
SCW[A-Z0-9]{20} → [REDACTED:scw_access_key]
(?i)scw_secret[_\s=:]+[a-f0-9-]{36} → [REDACTED:scw_secret_key]
sk_live_[A-Za-z0-9]{24,} → [REDACTED:stripe_secret_key]
rk_live_[A-Za-z0-9]{24,} → [REDACTED:stripe_restricted_key]
ghp_[A-Za-z0-9]{36} → [REDACTED:github_pat]
github_pat_[A-Za-z0-9_]{82} → [REDACTED:github_pat_fine]
sk-ant-[A-Za-z0-9-_]{93} → [REDACTED:anthropic_key]
sk-[A-Za-z0-9]{48} → [REDACTED:openai_key]
eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+ → [REDACTED:jwt]
(?i)(password|passwd|pwd)\s*[=:]\s*\S+ → [REDACTED:password_value]
(?i)(api_?key|apikey)\s*[=:]\s*\S+ → [REDACTED:api_key_value]
(?i)(secret|token)\s*[=:]\s*\S+ → [REDACTED:secret_value]
(?i)(access_?key|auth_?token)\s*[=:]\s*\S+ → [REDACTED:auth_value]
-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY----- → [REDACTED:private_key_block]
(?i)(postgres|mysql|mongodb|redis):\/\/[^:]+:[^@]+@ → [REDACTED:dsn_with_credentials]

L5 — Known-Format: ENV-style assignments to UUID4 → [REDACTED:uuid_credential]. PEM cert blocks → [REDACTED:certificate_block].

L6 — User-Defined: .eroldignore (gitignore-style at project root or ~/.eroldignore). EROLD_NEVER_CAPTURE env var (colon-separated globs).

Default-deny for new event types: any type not in EVENT_TYPES → 422 before redaction. PR template requires “redaction coverage” checkbox.

Create (in app.erold.dev → Settings → API Keys):

Terminal window
echo -n "erold_YOURVALUE" | secret set erold.api-key
secret has erold.api-key

Inject.mcp.json references a launcher, not the key:

{
"mcpServers": {
"erold-pm": {
"command": "${HOME}/.local/bin/erold-mcp-launch",
"args": []
}
}
}

~/.local/bin/erold-mcp-launch:

#!/bin/bash
exec env \
EROLD_API_KEY="$(secret get erold.api-key)" \
EROLD_TENANT="$(secret get erold.tenant-id)" \
npx --yes=false @erold/mcp-server@1.6.0 "$@"

Key never on disk, never in ps aux (the exec env form keeps it off the command line).

Rotate: generate new key in UI → secret set erold.api-key → revoke old → restart Claude Code. Cadence: ≤90 d, immediate on suspected exposure.

Revoke: revoke in UI; secret rm erold.api-key.

Audit trail: every API key use → server-side api_key_used event (key ID, tenant, ts, source IP, endpoint). Surfaced via /erold-audit?type=api_key_used.

Allowed fields on POST /v1/credentials: name, store (keychain|scaleway-sm|gitea-secrets), purpose, last_used_at, project_id.

Rejected fields (422 CREDENTIAL_VALUE_FORBIDDEN, recursive depth ≤5): value, secret, secret_value, encrypted_value, hash, token, password, key.

FORBIDDEN_KEYS = {"value","secret","secret_value","encrypted_value","hash","token","password","key"}
def check_no_credential_values(body: dict, depth: int = 0):
if depth > 5: return
for k, v in body.items():
if k.lower() in FORBIDDEN_KEYS:
raise HTTPException(422, "CREDENTIAL_VALUE_FORBIDDEN")
if isinstance(v, dict):
check_no_credential_values(v, depth + 1)

Mandatory pytest case (hard gate, runs on every backend PR):

async def test_tenant_b_cannot_read_tenant_a_fragments(...):
marker = "CROSS_TENANT_ISOLATION_MARKER_xK9mP2qR"
# log as A
await client.post(f"{api}/v1/tenants/tenant-a/events",
json={"projectId": project_a_id, "content": marker, "type": "observation"},
headers=tenant_a_headers)
await asyncio.sleep(3) # async compression
# search as B
r = await client.get(f"{api}/v1/tenants/tenant-b/fragments/search",
params={"q": marker}, headers=tenant_b_headers)
assert r.json()["data"]["total"] == 0

Also verify B cannot directly GET tenant-a endpoints with its own token (403).

Pinning in .mcp.json:

{
"mcpServers": {
"erold-pm": {
"command": "${HOME}/.local/bin/erold-mcp-launch",
"args": [],
"env": {
"_EROLD_MCP_VERSION": "1.6.0",
"_EROLD_MCP_SHA256": "a3f8c2d1e9b74056f3a2c8d7e1f4b9a0c5e2d8f1a7b3c6e9d2f5a8b1c4e7d0f3"
}
}
}
}

Launcher verifies sha256 before exec. CI: npm publish --provenance per release; RELEASE.sha256 updated by CI not by hand; latest dist-tag never referenced; manual install verified via npm view @erold/mcp-server@1.6.0 dist.integrity.

  1. Auth headers never in error fragments — strip before formatError.
  2. Credential values never in deploy log fragments — export [A-Z_]+= lines through full filter.
  3. Path denylist (§5.2 L1) applies to ALL capture paths — no opt-out at call site.
  4. Error messages to callers contain stable error codes only; raw exception messages → ~/.erold/error.log only.
  5. Retry payloads are not re-redacted — cached redacted body retried.
  6. Session boundary events log only project ID + timestamp — never Bash output, prompt text, or env vars.
  7. .eroldignore / EROLD_NEVER_CAPTURE apply BEFORE entropy detector (additive to built-in denylist, not replacement).

Every CredentialReference touch → immutable Event type=credential_ref:

FieldValue
typecredential_ref (fixed)
actioncreate / touch / rotate / revoke
credential_ref_idrecord ID
actorAPI key ID (never the value)
tenant_idtenant performing
timestampserver-assigned UTC ISO-8601
source_ipgateway-recorded

Append-only — no UPDATE or DELETE. Surfaces:

  • /erold-creds — last 50 events grouped by name; no values.
  • /erold-audit — paginated NDJSON export (?from&to&type=credential_ref); SOC 2 / compliance.
  • Anomaly detection: touch without prior create → integrity violation alert to tenant owner.

Decision log for this plan (must record once approved)

Section titled “Decision log for this plan (must record once approved)”
  • Backend platform: Postgres+Scaleway vs. continue Firestore. Default: migrate.
  • intenttask rename: do at Phase 2 with backward-compat alias.
  • Local-first SQLite outbox: yes (irreversible-by-inertia if deferred).
  • Embedding compute: OpenAI initially; switch to Scaleway H100 if monthly cost > €40.

Top open questions (must resolve before Phase 1)

Section titled “Top open questions (must resolve before Phase 1)”
  1. Backend migration scope: full Firestore→Postgres, or compensating Firestore patterns? (architect risk #2)
  2. pgvector availability on Scaleway Managed RDB PG17 fr-par — verify before designing schema around it.
  3. Serverless Container private-network attachment GA in fr-par — required for §4.2 topology.
  4. scw container container wait exact semantics — required for §4.5 prod deploy gate.