Monce Selfservice v0.1.0

Memory-augmented extraction for Outlook / email workflows.

Economics

Per-Call Cost

One /v1/extract call decomposes into three Bedrock-chargeable pieces. Monolith's VLM pipeline is by far the dominant cost; selfservice's additions (recall, log, distillation) are a ~5% overhead.

ComponentModelTokens (typical)Cost
Monolith VLM pipeline (7 stages)Haiku 4.5 + Sonnet 4.64k–12k in / 2k–4k out~$0.008 – $0.020
Insight distillation (auto_memory=true)Haiku 4.5~600 in / ~150 out~$0.0003
Memory recall (keyword-scored)none (deterministic)0$0
Event log + memory appendnone (disk JSON)0$0
Total per extraction~$0.008 – $0.020

Per-Call Latency

ComponentTypical
Recall (keyword-scored memory search)< 50ms
Monolith extraction (1-page PDF)8 – 12s
Monolith extraction (dense multi-page)20 – 60s
Insight distillation (Haiku)1 – 2s
Memory write (disk)< 20ms

Infrastructure Cost

ResourceTierMonthly
EC2 t3.small (2 vCPU, 2GB)on-demand, eu-west-3~$15
EBS gp3 root volume20GB~$2
Route53 hosted zone entry1 A record$0 (zone already billed)
Let's Encrypt SSLfree$0
Data transferlow (files proxy-only)< $1
Baseline infrastructure~$18 / month

Capacity

t3.small is comfortably over-provisioned: FastAPI + gunicorn 2-worker + thin I/O leaves most CPU/memory idle. The bottleneck is always the upstream Bedrock latency, not the box. A single instance sustains >100 concurrent /v1/extract calls before connection-pool pressure appears.

Scaling Levers

Break-Even

At $0.012 / extraction average and ~$18/month fixed, break-even vs per-user SaaS is at ~1500 extractions/month across all users. For an addin shipping to a 10-operator glass ops team doing 15 extractions/day each, that's covered by morning 2.