Monce Selfservice v0.1.0
Memory-augmented extraction for Outlook / email workflows.
Architecture
Diagram
┌──────────────────────────────────────────────────────────────┐
│ Outlook add-in (manifest.xml + addin.js) │
│ ├── user_id (8-char, roaming setting) │
│ ├── attachments + subject + body │
│ └── POST /v1/extract (multipart) │
└───────────────┬──────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ selfservice.aws.monce.ai (FastAPI, t3.small, eu-west-3) │
│ │
│ routes.py /v1/* + /health │
│ memory.py per-user JSON: memories, extractions, │
│ conversations, digests │
│ extract.py forwards to monolith/summit/extract, │
│ logs event, triggers insight distillation │
│ insights.py Haiku 4.5 → JSON bullets → memory │
│ chat.py Sonnet 4.6 grounded on user memory │
│ bedrock.py thin Bedrock Converse client │
│ │
│ data/users/{user_id}/ │
│ memories.json append-only, keyword-scored │
│ extractions.json event log (500 ring buffer) │
│ conversations.json chat history (200 ring buffer) │
│ extractions/{task_id}/result.json full monolith payload │
└───┬──────────────────────────────────────┬───────────────────┘
│ │
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────────┐
│ monolith.aws.monce.ai │ │ Bedrock (eu-west-3) │
│ /summit/extract │ │ Haiku 4.5 (insights) │
│ 7-stage VLM pipeline │ │ Sonnet 4.6 (chat) │
└─────────────────────────┘ └─────────────────────────────┘
Request Lifecycle — POST /v1/extract
- Multipart body parsed:
user_id, files,
email_subject, email_body,
auto_memory.
memory.search_memories() scores the user's memory against
email_subject; top 8 hits become prior_memories.
- Files + params forwarded as multipart to
monolith.aws.monce.ai/summit/extract. Default timeout 300s.
- Monolith's response payload saved to
data/users/{user_id}/extractions/{task_id}/result.json.
- Lightweight event logged via
memory.log_extraction()
(task_id, filename, vertical, trust, routing, duration).
- A structured memory entry is written tagged
extraction.
- If
auto_memory=true, insights.distill_insights()
fires a Haiku call. Bullets are appended as memories tagged
insight.
- Response:
{task_id, result, insights, prior_memories, duration_ms}.
Endpoints
| Method | Path | What |
| GET | /health | liveness |
| POST | /v1/extract | multipart upload + email context |
| POST | /v1/remember | store arbitrary text memory |
| POST | /v1/forget | delete memories matching substring |
| GET | /v1/recall | keyword-scored memory search |
| GET | /v1/memories | paged list (+ tag filter) |
| GET | /v1/history | past extractions, most recent first |
| POST | /v1/feedback | accept / reject / correct / note |
| POST | /v1/chat | Sonnet grounded on user memory |
| GET | /v1/user/{user_id}/stats | counts |
Infrastructure
| Resource | Value |
| EC2 | t3.small, Ubuntu 22.04 LTS, eu-west-3 |
| VPC | vpc-0c99f4b9ac1d073ad (POC) |
| DNS | selfservice.aws.monce.ai → Route53 zone Z08902341SIIJX80NFN4N |
| SSL | Let's Encrypt via certbot (auto-renew) |
| Service | systemd selfservice.service, gunicorn+uvicorn, 2 workers, port 8002 |
| Reverse proxy | nginx, 300s timeout, 50M upload |
| Storage | /opt/selfservice/data/users/ — per-user JSON |
File Layout
selfservice.aws.monce.ai/
├── api/
│ ├── main.py FastAPI app + CORS + mounts
│ ├── routes.py /v1/* endpoints
│ ├── docs_pages.py /paper /architecture /economics
│ ├── config.py env-driven Config
│ ├── memory.py per-user stores
│ ├── extract.py monolith proxy + event log
│ ├── insights.py Haiku distillation
│ ├── chat.py Sonnet chat
│ └── bedrock.py Bedrock Converse client
├── static/index.html landing
├── addin/ Outlook add-in (manifest + JS + CSS)
├── terraform/ EC2 + SG + Route53 + deploy.sh
└── requirements.txt