REMATIQ · RESEARCH AGENT · PRODUCT CASE

Trust is the product.

TRUST IS THE PRODUCT

FAITHFUL [quote is real] + RELEVANT [on topic] + SUPPORTIVE [backs the claim]

Thesis: make verifiable citation the product. This deck follows its own rule, every claim and figure carries a visible source. The baseline (faithful+relevant) answers the case; the extension (supportiveness+intent agent) is the new solution.

RECAP · THE BRIEF

WHAT WE WERE ASKED

Take the research agent from prototype to trusted every day

REMATIQ is a MedTech compliance platform in three layers: the compliance graph, workflows, and a general-purpose research agent. This case is only about the third layer, the one that handles long-tail Q&A and one-off documents.

Five areas: regulatory Q&A, internal-data Q&A, document generation, lifecycle, citing sources
Two hard constraints: every answer needs a verifiable source; the agent works on the connected graph
Four deliverables for Monday: ① reprioritized user stories ② align with Stefan ③ align with Anton ④ prototype

DECK CONTENTS

01The brief (this page)

02REMATIQ architecture

03Argument overview

04Baseline: verifiable citation is the floor

05Production answer · reduce-hallucination

06Frontier: supportiveness

07Solution: intent agent + worked example

08Deliverable 1 · reprioritized stories

09Deliverable 2 · Stefan (needs / ML+data)

10Deliverable 3 · Anton (cost / iteration+risk)

11Prototype + UI rationale (clickable demo)

12Close

Restate what they sent so everyone answers the same question. The prototype (4th deliverable) is deferred this round, after the concept is settled.

CONTEXT · ARCHITECTURE

WHERE THE RESEARCH AGENT SITS

Where the research agent sits in REMATIQ: it reads and writes on top of the graph

COMPLIANCE EXECUTION

Verifiable Workflows

Documentation Agents · the research agent THIS CASE

▲ read / write ▲

TRACEABILITY ENGINE

RegulationsProductsRequirementsDesignTestsRisk ManagementMitigationsManufacturingSOPsWork InstructionsRecords → Submissions · Documentation · Records

Life Science Data Ontology verifiable citation = typed links here

Audit Graph attestation lives here

▲ pull / write-back ▲

DATA SOURCES

Laws & Regulations·Quality & QMS·Engineering & ALM·Regulatory & RIM·Files & APIs

From REMATIQ's site. The research agent is the top-layer Documentation Agents, reading and writing on the Ontology and Audit Graph. REMATIQ.COM

REMATIQ's real architecture. The research agent = top-layer Documentation Agents, reading/writing on the mid-layer Traceability Engine (Life Science Data Ontology + Audit Graph). Verifiable citation lands on the ontology's typed links and the audit graph, in their own words, no need to borrow Palantir.

OVERVIEW

THE ARGUMENT, IN ONE BREATH

A citation is a contract the backend must honor.

First confirm the industry baseline, then the solution proven in production, then name the frontier, then the solution and deliverables. Baseline and extension stay clearly separate.

ANSWERS THE CASE · BASELINE

Verifiable citation is the industry floor; faithfulness and relevance are largely solved
reduce-hallucination ships these two layers in production

EXTENSION

The real frontier is supportiveness, and it depends on the user's intent
Solved with an intent agent and “surface, not decide”

better-doc: lead with one thesis, then the route. Keep baseline (answering the case) and extension (new ideas) apart so focus doesn't blur.

CASE · BASELINE

THE FLOOR, ALREADY SOLVED

An answer without a source is nearly useless.

This is the floor in regulated industries. Medical and legal AI long ago made “answers with clickable sources” standard: each sentence anchors to the source, hover to preview, click to verify.

OpenEvidence

Clinical Q&A; cites peer-reviewed sources sentence by sentence; declines when unsupported OPENEVIDENCE

Harvey

Legal; cites the specific clause/paragraph, click back to verify HARVEY

NotebookLM

RAG + inline citations; ~13% response-level hallucination (vs ~40% without grounding) NOTEBOOKLM

So “showing citations” is just the entry ticket; the real moat is backend verification. It solves the first two pillars: faithfulness (the quote is real) and relevance (on topic).

Concede this is the baseline; don't sell the entry ticket as innovation. This page also introduces the first two of the three pillars.

CASE · BASELINE

PRODUCTION CANON

reduce-hallucination: take the first two pillars into production

It borrows proven techniques from interrogation science for getting a knowing witness to tell the truth PEACE · SUE: treat each LLM node as a witness, build five gates, and validate in production 2,818 TASKS. At this point, faithfulness and relevance are both solved.

1Schema keeps an abstention exit: output null when unsure, never guess

2The {value · source · verbatim quote} triple

3At the exit, code-check the quote exists verbatim (zero LLM cost)

4Label provenance: stated · inferred · absent

5Verification must be real: asking for citations without checking teaches the model to fake more convincing ones

“Ask-and-actually-check” effect g≈0.80; in the same prompt, fields with an abstention exit had zero corrections. VERIFIABILITY STUDY PRODUCTION A/B OPEN SOURCE · GITHUB

This is the unfair advantage vs other candidates: not theorizing about citation, but having shipped this mechanism in production (speak in first person). Make clear: borrow interrogation science → five gates → production validation → faithfulness+relevance solved.

EXTENSION

THE FRONTIER · THIRD PILLAR

Supportiveness: does the cited passage actually support the claim?

Real and relevant does not mean supportive. A citation that resolves but does not support the claim (misgrounding) is worse than none, because it manufactures false trust. Support vs contradiction depends on which direction the user argues.

Supportiveness = Stance × Intent

0%

link valid / relevant · looks fine

39–77%

actually supports the claim · fails in substance

SOURCES CITED BUT NOT VERIFIED · 2026 STANFORD REGLAB · MISGROUNDING SCITE.AI ALCE · partial-support limit

This is the extension, so it's badged EXTENSION, with sources on the table, the “verifiable” idea applied to my own answer. 94% vs 39–77% is the key figure.

EXTENSION

TWO AGENTS · PROACTIVENESS

Beyond answering, a second agent that infers intent

EXECUTION [answer] INTENT [reads rings]

The execution agent grounds the answer; the intent agent starts from the current question and pulls in context ring by ring to infer what the user is really arguing.

① Question · what's being asked

② Session · answers / draft / revisions

③ History · the user's past choices

④ Team & product · how others write / existing content

⑤ Scene · the wider context

Combine the layers to infer intent, then decide which citation to use and resolve supportiveness with both directions. When intent is unclear, show both and let the human choose. Design after KeyCite: direction is a review flag, not a verdict. WESTLAW KEYCITE

Two agents: execution (answer) and intent (reads rings). The intent agent's input is context expanding outward: question→session→history→team/product→scene. Combine to infer intent, decide which citation, and how to resolve supportiveness.

EXTENSION · A WORKED EXAMPLE

WORKED EXAMPLE · no UI, just the process

Draft · BP Monitor risk management file

“SOP-042 meets ISO 14971’s residual-risk requirements.”

① Faithful + relevant · reduce-hallucination

Retrieve ISO 14971 §8 and SOP-042 §6; on topic.
Five gates:
✓ the triple (value · source · verbatim quote)
✓ verbatim check: ‘residual risk shall be evaluated’ does exist in §8
✓ provenance = stated
✓ real check passed
→ faithfulness + relevance solved

② Bring in context · intent agent

Read session + draft: this sits in the ‘gap assessment’ section, in a self-assessing tone.
Pull in history and scene →
Intent unclear: prove compliance, or find the gap?

→ unclear, don't decide for the user

③ Supportiveness · two citations, user chooses

▲ Supports ISO 14971 §6 ‘risk control’ → backs ‘meets’
▼ Contradicts §8 requires post-market residual-risk monitoring; SOP-042 §6 has no such step → exposes a gap

→ whichever is picked anchors the deliverable and feeds back as an intent signal

Same draft sentence: real and relevant; but which direction it supports depends on intent. Finding the opposing evidence is gap analysis. (Clause numbers are illustrative; the real UDM paragraph governs.) ISO 14971 · example SOP-042 · example

End-to-end example: first reduce-hallucination's five gates solve faithful+relevant (each checked), then the intent agent brings in more context, then two opposing citations for the user to pick. No UI, just the process. Clause numbers are illustrative, a reminder not to fabricate.

DELIVERABLE · ONE

REPRIORITIZED USER STORIES

Citation is the spine, yet the PRD filed it under NICE; promote it to P0

P0 · spineclickable citations · jump to section/paragraph · verbatim-quote check · abstention verdict (not in library / out of scope)

P1progressive-disclosure answers (short claim + chip + expand, fixes “too long”) · deliverable as its own doc, PDF export

deferredimage support · DOCX · multi-doc chat · full save-to-library (but citations carry a version stamp from now on)

Basis: both citation stories are NICE and unbuilt in the PRD, while positioning and strategy treat “verifiable” as core. The “too long” feedback is structural; progressive disclosure fixes it. PRD PILOT FEEDBACK

One of the deliverables answering their question. Reprioritization: promote the misfiled-as-NICE spine to P0, and use progressive disclosure to solve the #1 complaint.

DELIVERABLE · TWO / A

ALIGN WITH STEFAN · CUSTOMER NEED

Align with Stefan (1): validate the customer-need assumptions first

Alignment = come with a judgment to confirm or refute, not open-ended questions. Frame each question as “which product decision does it settle for me”.

Need · length“Answers too long”: is the real need less information, or one-click verify then expand?
Validates: progressive disclosure vs blunt truncation · evidence: revisit Marcel / Paul's langfuse traces

Need · costHow much costlier is a confident wrong answer vs an honest abstention for an RA?
Validates: how aggressive abstention should be · a wrong conclusion in a submission costs far more than “not in the library”

Need · two-wayShowing both supporting and contradicting evidence: does it feel powerful, or like the tool is unsure?
Validates: whether the supportiveness feature is worth building, and how to present it

Need · priorityWhich area do pilot customers actually push on (regulatory Q&A / generation / gap)?
Validates: whether my area-priority order is right

I bring not a question list but a judgment plus a set of assumptions for Stefan to confirm or refute.

Customer-need side. Translate vague feedback like “too long” into concrete product decisions Stefan can confirm/refute. This page makes clear what I align on, why, and with what evidence.

DELIVERABLE · TWO / B

ALIGN WITH STEFAN · ML & DATA

Align with Stefan (2): can the data and ML support the citation contract?

ML feasibility

Retrieval granularity: can we reliably retrieve at UDM paragraph / span level?
Verbatim verification: deterministic string-match against UDM; at what point does OCR normalization need a bounded fuzzy matcher? (the cost fork)
Stance classifier: do we have / can we build claim-evidence entailment? Is it accurate on conditional regulatory language?
Intent agent: ML (embedding / clustering over the session) or just a prompt?
How is the abstention threshold calibrated? What triggers it?

Data structures

Does each UDM paragraph have a stable, resolvable, version-stamped anchor?
Are typed links queryable at answer time? (so ‘inferred’ shows the real relation chain, not a vector guess)
Is revision / draft history recorded and accessible? (the intent agent's lifeblood)
Is org / project scope enforced at the data layer?

In one line: if these hold, the spine ships in v1; if not, fix the data first, don't build flourishes.

ML + data side. The crux: can data and ML support our “verifiable + supportive + intent-aware” citation contract. OCR normalization is the one cost fork to confirm.

DELIVERABLE · THREE / A

ALIGN WITH ANTON · COST

Align with Anton (1): get the real cost per item; open with the cost asymmetry

Cheapthe triple · provenance badge · abstention verdict (schema / prompt only)
“The model already retrieves the spans; this is just an output-format constraint, no new infra”

Cheap–medverbatim string-match against UDM by anchor; deterministic, zero LLM
“Medium” only if OCR text normalization needs a bounded fuzzy matcher, the one estimate to nail down with Anton

Med · +1 LLMstance / cross-examiner node
only on sign-off, high-stakes answers, not every lookup

Pricey · uncertainintent agent
needs Anton to scope: async over session logs? on the existing background-execution layer? latency / cost / data dependencies?

The argument to Anton: most of the spine is cheap; the pricey half (the verifier) is exactly the line between a demo and a trusted tool. Skip it and the product gets more dangerous, not just less impressive.

Engineering-cost side, item by item. Persuade via the cost asymmetry: win the spine cheaply, frame the pricey parts honestly. OCR normalization is the one estimate to nail.

DELIVERABLE · THREE / B

ALIGN WITH ANTON · ITERATION & RISK

Align with Anton (2): sequence v1 / v2 / v3 by cost, and lay out the risks

Iteration path (by cost)

v1 · cheapfaithful + relevant + both-directions seed. verbatim match + abstention + both directions (logged, not yet fed back)

v2 · meddistill a fine-tuned NLI model (much lower cost / latency) + intent agent + closed learning loop (guards the contradiction class against confirmation bias)

v3 · priceyself-check + graduated autonomy + GxP validation / audit trail (mostly governance, not model work)

Engineering risks I raise

determinism on messy OCR docs (→ fuzzy-match threshold)
the intent agent's data dependency (are revisions logged)
proactivity × access-control intersection
version correctness: when a source changes, what happens to old citations

One-line cost story: v1 is a prompt and a schema; v2 is one cheap distilled model plus a pricey intent agent; v3 is mostly governance.

Iteration + risk. Let Anton sequence by cost, and proactively surface the engineering risks I've considered, so it isn't hand-waving. The one-line cost story is easy to remember.

PROTOTYPE

A CURSOR FOR COMPLIANCE DOCS · CLICKABLE DEMO

A Cursor for compliance docs: agents on the left, document on the right, verification in the middle

L AGENTS · SESSIONS

Multiple compliance tasks in parallel (like Cursor sessions) · bottom-left always-on intent agent: shows what it read, what was added to input, the current intent

M RUN LOG + CHAT

Grounded trace: read → verbatim verify ✓ → judge stance → both directions · evidence folded into an auditable log · click a citation to peek the source

R DOCUMENT EDITOR

The generated compliance doc, editable, with version history · revisions feed the intent agent · attest = attestation

No backend, one scripted case (BP Monitor / SOP-042 / ISO 14971). Verification is a real JS verbatim string-match. ▶ LIVE DEMO · /en/demo highlight · #go

A Cursor / Claude Desktop for compliance docs. Left agent sessions + always-on intent; center run log (verification folded into the audit trail); right versioned document editor. One case shows it all. Use /en/demo live; #go jumps to the highlight frame.

FEATURE WALKTHROUGH

6 capabilities · 6 requirements (captured from the live demo)

Source library · regulatory-blue/internal-gold · multi-session · always-on intent
Req: answers traceable · internal/regulatory boundary · intent visible

Grounded run · verbatim verify ✓ · both directions, you choose
Req: faithfulness + relevance + supportiveness

Click citation → source span highlighted & verified / open full doc in split
Req: click back to verify (regulation down to paragraph)

Paper-style editor · version history · semantic buttons
Req: editable deliverable + lifecycle

Version diff (green add / red delete)
Req: revisions traceable · fed back to the intent agent

Send to Workflow · spawns a structured session
Req: three-layer linkage (research agent = platform on-ramp)

Feature walkthrough: 6 shots from the live demo, each ‘feature → requirement’. A static, readable capability overview for the panel, no need to drive the complex demo live. See /en/demo or the tour for detail.

UI RATIONALE

WHY THIS LAYOUT · ON THE AGENT-IDE PARADIGM

Why this design: borrow the Cursor / Claude paradigm, add a compliance-only layer

Borrowed · agent-IDE paradigm

Three columns: left agent sessions / center conversation+run / right artifact editor CURSOR · CLAUDE DESKTOP
Parallel sessions + model picker + @-mention docs CURSOR · CLAUDE
Cite to a specific paragraph, click to verify HARVEY
Atomic claims, each with provenance HEBBIA
Per-block accept + version / diff GEMINI IN DOCS · CURSOR

The compliance-only layer we add

Every claim is verbatim-verified backend; abstain if ungrounded (Cursor verifies code, not facts)
Always-on intent agent: keeps inferring what you're arguing, proactively offers both directions (Cursor / Claude don't)
Everything auditable: run log + version history + attestation → Audit Graph

Cursor / Claude Desktop proved the agent-IDE interaction works; bring it to compliance and add “verifiable + intent-aware + auditable”. That is a Cursor for compliance docs.

Answers “why this design”: on the Cursor / Claude agent-IDE paradigm (left agent / center chat / right artifact), add three compliance-only things: backend verification, an always-on intent agent, end-to-end auditability. No longer NotebookLM-led.

THE ONE TAKEAWAY

A verifiable source is the foundation of trust.

The baseline already ships in production; the frontier, supportiveness, is solved with the intent agent. Make the near field solid, advance the rest by v1 → v2 → v3.

▶ Open the clickable DEMO · rematiq-case.pages.dev/demo

Close. This deck practices its own thesis: every claim carries a visible source. End with a clickable demo link.