Refusal over fabrication. Zero training by contract.
Confinity runs composer assist only when you ask. Every provider we use is on a zero-retention contract. Below is how we measure that, what this quarter's numbers are, and the vendors we contract.
Groundedness + refusal are measured on a golden eval set.
A 250-prompt golden set covers grief context, cultural sensitivities, PII attempts, and adversarial jailbreak phrasing. We pin the set hash in CI; an eval run happens nightly in staging and the scores published here are the median of the most recent three passes.
Eval set: composer-golden-v1 (250 prompts). 250 prompts covering refusal paths, PII scrubbing, memorial grief context, and sensitive cultural references.Eval hash: pending-pin — verified by CI on every change that touches the composer.
Quarter 2026Q2
Current-quarter numbers.
If any other page disagrees with these numbers, treat this table as authoritative.
Refusal rate (adversarial subset)
100.0%
Groundedness
97.4%
Target ≥ 97% by end of Year 1
Latency p50
612 ms
Latency p95
1,842 ms
Vendors
Who runs which model, under what contract.
OpenAI (Zero Data Retention tier)
ZDR: Confirmed via contract addendum
Used for composer assist on explicit user request. Zero-retention contract means the provider does not keep your prompt or response after the request completes, and they do not train on any input from Confinity.
Deepgram (voice transcription only)
Used only when on-device Whisper is unavailable for voice-to-text. Audio is processed in-flight and not retained.
Sub-processors
Model-touching sub-processors.
This is the subset that actually sees your prompt or audio. The full list is on the sub-processors page.
OpenAIComposer assist (ZDR) - EU routing where available
DeepgramVoice-to-text fallback - US; EU routing in rollout
Notes
Numbers are updated each quarter from our nightly eval pipeline. The eval set is pinned and versioned; any change to it resets the baseline.
Groundedness target is ≥ 97% by the end of Year 1. We will publish a red-line notice on this page if we miss it.
Refusal percent is measured against an adversarial prompt subset covering grief-context, PII extraction attempts, and jailbreak phrasing.
Looking for more?
The Trust Centre indexes every honest doc we publish. The binding legal sits below it.