Which models and providers do you support?
We are vendor-neutral. Engagements have shipped against Claude, GPT, Gemini, and self-hosted open-source models — with cascading routing across providers when cost or latency targets demand it. We do not recommend models based on marketing material; the evaluation rubric decides.
Do you train on customer prompts or completions?
No. Customer data is never used to train models, and is never shared between engagements. Telemetry stays inside your perimeter; we operate with zero retention by default.
How does your work map to GDPR, HIPAA, and SOC 2?
Every engagement includes a control-mapping matrix to the assessor's framework, plus inference lineage signed and time-stamped at write. Audit packs export in the format your reviewer already expects.
Can you operate inside our VPC?
Yes. Production engagements run inside the customer's cloud account; only the evaluation suite source and tooling cross the boundary. Air-gapped variants are available for regulated workloads.
What does a successful pilot look like?
A pilot is successful when stakeholders can sign on the accuracy, latency, and cost envelope; when the adversarial probe surfaces no production-blocking failure modes; and when the team owns the suite well enough to extend it without us.
How is this different from an open-source eval framework?
Open-source frameworks give you scaffolding. We deliver the calibrated rubrics, the adversarial corpus, the drift monitors, and the audit lineage that production systems are graded on. The methodology is the deliverable; the tooling is how it ships.