Monce Selfservice v0.1.0

Memory-augmented extraction for Outlook / email workflows.

Home UI Dashboard Paper Architecture Economics Add-in API

Economics

Per-Call Cost

One /v1/extract call decomposes into three Bedrock-chargeable pieces. Monolith's VLM pipeline is by far the dominant cost; selfservice's additions (recall, log, distillation) are a ~5% overhead.

Component	Model	Tokens (typical)	Cost
Monolith VLM pipeline (7 stages)	Haiku 4.5 + Sonnet 4.6	4k–12k in / 2k–4k out	~$0.008 – $0.020
Insight distillation (`auto_memory=true`)	Haiku 4.5	~600 in / ~150 out	~$0.0003
Memory recall (keyword-scored)	none (deterministic)	0	$0
Event log + memory append	none (disk JSON)	0	$0
Total per extraction			~$0.008 – $0.020

Per-Call Latency

Component	Typical
Recall (keyword-scored memory search)	< 50ms
Monolith extraction (1-page PDF)	8 – 12s
Monolith extraction (dense multi-page)	20 – 60s
Insight distillation (Haiku)	1 – 2s
Memory write (disk)	< 20ms

Infrastructure Cost

Resource	Tier	Monthly
EC2 t3.small (2 vCPU, 2GB)	on-demand, eu-west-3	~$15
EBS gp3 root volume	20GB	~$2
Route53 hosted zone entry	1 A record	$0 (zone already billed)
Let's Encrypt SSL	free	$0
Data transfer	low (files proxy-only)	< $1
Baseline infrastructure		~$18 / month

Capacity

t3.small is comfortably over-provisioned: FastAPI + gunicorn 2-worker + thin I/O leaves most CPU/memory idle. The bottleneck is always the upstream Bedrock latency, not the box. A single instance sustains >100 concurrent /v1/extract calls before connection-pool pressure appears.

Scaling Levers

Data — per-user JSON on disk is adequate up to ~1000 memories / user. Beyond that, move to SQLite or digest layer.
Insight distillation — currently Haiku on every auto_memory=true call. If volume scales, batch or debounce (distill every N-th extraction rather than every one).
Monolith timeout — 300s default is generous. Most calls finish in < 30s; failures are mostly dense PDFs (> 10 pages).

Break-Even

At $0.012 / extraction average and ~$18/month fixed, break-even vs per-user SaaS is at ~1500 extractions/month across all users. For an addin shipping to a 10-operator glass ops team doing 15 extractions/day each, that's covered by morning 2.