Charles Dana, Monce SAS — April 2026
Selfservice is a thin memory layer that sits around an existing VLM extraction
pipeline. Every extraction mutates a per-user memory store; every subsequent
extraction reads it back as context. The architecture is intentionally a mix:
the extraction engine is monolith (/summit/extract), the
memory model is concierge (per-user memories.json), and
the distillation step is Haiku 4.5. What's new is the reflex loop
that chains them.
A sandwich with three slices instead of two. Input is a file and an email context (subject + body). Output is structured data plus distilled insight.
prior_memories so the addin can display them.monolith/summit/extract. Monolith runs its 7-stage VLM
pipeline unchanged — selfservice adds zero latency.auto_memory=true, a Haiku
call receives the compacted extraction result, the email context, and
the relevant prior memories. It returns 1–3 bullets ≤ 110 chars each,
focused on surprising or decision-relevant patterns. Bullets are written
back as memory entries tagged insight.Concierge tracks factory-level patterns (top clients per tenant, glass type frequency, synonym recommendations). Selfservice tracks how an individual operator behaves: the shorthand they type in emails, the factories they route to, the corrections they make on the same supplier every week. This information is useless at factory-level aggregation and highly valuable when threaded through a single operator's sequence of extractions.
Storage is isolated under data/users/{user_id}/. There is no
cross-user chat, no cross-user recall, no cross-user digest. The
user_id is an 8-character opaque token persisted client-side in
the Outlook add-in's roaming settings.
When Outlook(user_id=..., auto_memory=True) is instantiated,
every call to extract_email() fires:
recall(subject) → extract(file, context=body) → distill(result, prior=recall_output) → remember(bullets)
The Haiku pass is cheap (~$0.0003/call) and short (~1-2s), which makes
the reflex acceptable as an always-on. When auto_memory=false,
recall and logging still happen; only the Haiku distillation is skipped.
/v1/chat is Sonnet 4.6 grounded strictly on the requesting
user's memories, extraction history, and recent conversations. The system
prompt pins the model to its user scope. Topics outside the user's data yield
"no record" rather than hallucination.
Monolith's own addin posted to /extract directly and had no
memory. Concierge tracked the tenant but not the operator. Selfservice fills
the gap between the two. The addin repoints its base URL and inherits memory
for free.
_fallback_insights produces a simpler
deterministic bullet./v1/forget_by_id is planned.