Case · Forge (self-applied) → Shield productisation · 6 weeks (Forge-shaped, internal) · then productised
Internal — Talpro Universe · Responsible-AI eval harness
Forge (self-applied) → Shield productisation · 6 weeks (Forge-shaped, internal) · then productised
Talpro India
Outcomes · measured
PRs gated
in 14 weeks across 4 products
Bias probes blocked at PR
would have shipped subtly biased rankings
Bias-detection mean time
manual review → CI gate
Narrative
01 · Problem
Problem
By late 2025 every Talpro AI surface — CV screening, recruiter desktop, candidate matching — had its own ad-hoc evaluation script written in whatever language the product team preferred. There was no shared definition of ‘is this model regression-free’, no shared probe library, no shared way to talk about bias. When the first enterprise prospect's DPO sent the AI-risk questionnaire, three different product teams gave three different answers about how Talpro tested for bias. The deal nearly stalled. Worse: the recruiter team was measuring shortlist-rank correlation but not measuring timezone-drift; the matching team was measuring drift but not bias; nobody was measuring prompt-injection resilience. Each team had a partial truth. Composing them into one defensible answer was a per-prospect rebuild.
02 · Approach
Approach
Six-week Forge engagement (internal) to build ProveIQ — a single eval harness with a portable probe library, a CI gate, and a public-facing report generator. Weeks 1–2: catalogue every existing eval across the four products into a unified taxonomy (rank correlation, recruiter-agreement, bias probe across protected attributes, drift detection, prompt-injection resilience, PII leakage, cost-per-inference, timezone stability). Weeks 3–4: build the harness as a TypeScript library with a 500-variant probe dataset (name swap, age markers, majority/minority markers across Hindi-belt, Tamil, Malayalam, Bengali surnames). CI plugin gates every PR. Weeks 5–6: build the report generator that emits the 28-page Responsible-AI annex DPOs actually read — the same one used in the Shield case (`/cases/shield-dpdpa-readiness`). Productised as the eval engine behind every Shield engagement.
03 · Outcome
Outcome
ProveIQ has gated 312 PRs across the Talpro Universe in the 14 weeks since rollout. 11 PRs were blocked by the bias probe (would have shipped subtly biased rankings), 4 by drift detection, 2 by prompt-injection. Mean time to bias-detection on a regression dropped from days (manual review) to minutes (CI). The same 28-page annex generator is now the single document Talpro hands to enterprise DPOs in Shield engagements — three different DPOs in three months accepted it without a 40-question follow-up. ProveIQ is the foundation under every Talpro AI claim: every metric on this site is reproducible because ProveIQ generated it.
04 · In their words
“Before ProveIQ, three teams had three different answers to the DPO’s bias question. After ProveIQ, we have one answer and a 28-page annex that backs it.”
05 — Who led this engagement
Bhaskar Anand. Every first call.
Founder & CEO, CompetitorX. Pune, India. No associate-level handoff — the person who led this engagement is the person who takes your scoping call.