Monce Selfservice v0.1.0

Memory-augmented extraction for Outlook / email workflows.

Architecture

Diagram

┌──────────────────────────────────────────────────────────────┐ │ Outlook add-in (manifest.xml + addin.js) │ │ ├── user_id (8-char, roaming setting) │ │ ├── attachments + subject + body │ │ └── POST /v1/extract (multipart) │ └───────────────┬──────────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────┐ │ selfservice.aws.monce.ai (FastAPI, t3.small, eu-west-3) │ │ │ │ routes.py /v1/* + /health │ │ memory.py per-user JSON: memories, extractions, │ │ conversations, digests │ │ extract.py forwards to monolith/summit/extract, │ │ logs event, triggers insight distillation │ │ insights.py Haiku 4.5 → JSON bullets → memory │ │ chat.py Sonnet 4.6 grounded on user memory │ │ bedrock.py thin Bedrock Converse client │ │ │ │ data/users/{user_id}/ │ │ memories.json append-only, keyword-scored │ │ extractions.json event log (500 ring buffer) │ │ conversations.json chat history (200 ring buffer) │ │ extractions/{task_id}/result.json full monolith payload │ └───┬──────────────────────────────────────┬───────────────────┘ │ │ ▼ ▼ ┌─────────────────────────┐ ┌─────────────────────────────┐ │ monolith.aws.monce.ai │ │ Bedrock (eu-west-3) │ │ /summit/extract │ │ Haiku 4.5 (insights) │ │ 7-stage VLM pipeline │ │ Sonnet 4.6 (chat) │ └─────────────────────────┘ └─────────────────────────────┘

Request Lifecycle — POST /v1/extract

  1. Multipart body parsed: user_id, files, email_subject, email_body, auto_memory.
  2. memory.search_memories() scores the user's memory against email_subject; top 8 hits become prior_memories.
  3. Files + params forwarded as multipart to monolith.aws.monce.ai/summit/extract. Default timeout 300s.
  4. Monolith's response payload saved to data/users/{user_id}/extractions/{task_id}/result.json.
  5. Lightweight event logged via memory.log_extraction() (task_id, filename, vertical, trust, routing, duration).
  6. A structured memory entry is written tagged extraction.
  7. If auto_memory=true, insights.distill_insights() fires a Haiku call. Bullets are appended as memories tagged insight.
  8. Response: {task_id, result, insights, prior_memories, duration_ms}.

Endpoints

MethodPathWhat
GET/healthliveness
POST/v1/extractmultipart upload + email context
POST/v1/rememberstore arbitrary text memory
POST/v1/forgetdelete memories matching substring
GET/v1/recallkeyword-scored memory search
GET/v1/memoriespaged list (+ tag filter)
GET/v1/historypast extractions, most recent first
POST/v1/feedbackaccept / reject / correct / note
POST/v1/chatSonnet grounded on user memory
GET/v1/user/{user_id}/statscounts

Infrastructure

ResourceValue
EC2t3.small, Ubuntu 22.04 LTS, eu-west-3
VPCvpc-0c99f4b9ac1d073ad (POC)
DNSselfservice.aws.monce.ai → Route53 zone Z08902341SIIJX80NFN4N
SSLLet's Encrypt via certbot (auto-renew)
Servicesystemd selfservice.service, gunicorn+uvicorn, 2 workers, port 8002
Reverse proxynginx, 300s timeout, 50M upload
Storage/opt/selfservice/data/users/ — per-user JSON

File Layout

selfservice.aws.monce.ai/
├── api/
│   ├── main.py           FastAPI app + CORS + mounts
│   ├── routes.py         /v1/* endpoints
│   ├── docs_pages.py     /paper /architecture /economics
│   ├── config.py         env-driven Config
│   ├── memory.py         per-user stores
│   ├── extract.py        monolith proxy + event log
│   ├── insights.py       Haiku distillation
│   ├── chat.py           Sonnet chat
│   └── bedrock.py        Bedrock Converse client
├── static/index.html     landing
├── addin/                Outlook add-in (manifest + JS + CSS)
├── terraform/            EC2 + SG + Route53 + deploy.sh
└── requirements.txt