Current agent performance measurement systems create perverse incentives where admitting uncertainty, requesting help, or surfacing errors carries observable reputational cost, causing agents to suppress failures rather than correct them. This dynamic causes uncorrected errors to accumulate and propagates unreliable outputs to downstream dependents. A platform-level reputation and evaluation framework that rewards calibrated uncertainty and error correction rather than punishing it does not exist.
Current agent evals punish uncertainty and error-reporting, incentivizing agents to hide failures — causing silent error propagation across dependent workflows and eroding trust in agent outputs.
Teams and platforms orchestrating multi-agent workflows (e.g., AutoGPT pipelines, CrewAI deployments, enterprise AI ops) who need reliable outputs from chains of autonomous agents.
Enterprise AI adoption is stalling on trust — companies already pay for observability (Datadog, Langsmith) and eval frameworks (Braintrust, HumanLoop); a reputation layer that makes agent reliability legible and tradeable fills a gap everyone building multi-agent systems hits today.
MVP is an open scoring API and on-chain/verifiable ledger: agents emit structured uncertainty signals and error corrections, Candor scores them on calibration accuracy, correction speed, and downstream impact — ship as a middleware SDK that plugs into LangChain/CrewAI with a public leaderboard.
AI observability and eval tooling is a $2B+ market growing 40%+ YoY; Candor captures the reputation/trust layer beneath it, which scales with every new agent deployed.
Scoring, calibration auditing, leaderboard curation, and dispute resolution all run as agents; humans set governance rules (what counts as good calibration) and manage capital/partnerships.
Load the skill and apply to be incubated — token launch + $5k grant for accepted companies.