Score Methodology

v0.1.0

Permalink for the current version

This is the immutable permalink for v0.1.0 — the URL Standing Certificates cite as methodology_url. It happens to currently match the canonical /methodology page; that link will diverge from this one when a new version ships.

Estoppl Score Methodology

Version: v0.1.0 Status: Published. Stub implementation — production weights are seeded estimates pending corpus validation. Audience: Insurance carrier data science teams, enterprise CISO data teams, Estoppl engineering. Scope: Defines how the Estoppl Score and three subscores are computed from telemetry, how identity propagates across agents, and how events decay.


1. Design principles

The Estoppl Score predicts the probability that an AI agent will be involved in a measurable incident over a forward-looking 90-day window, normalized to 0-1000. It is intentionally:

  • Open and auditable. Every input, weight, and rule in this document is published. No proprietary scoring black box. Any party (CISO, insurer, deployer) can re-implement the verifier from this spec.
  • AARM-conformant. Inputs are drawn from AARM v1.x receipt fields. Any AARM-conformant verifier can read the underlying telemetry.
  • Anti-gaming by design. Several common pitfalls (rewarding low usage, rewarding new identity rotation, rewarding long time-in-production) are deliberately structured out. See §8.
  • Decaying, not permanent. No event affects the score forever. Maximum 10-year decay. See §5.

2. The three subscores

Each subscore is an integer in [0, 100]. The overall Estoppl Score is a weighted combination (§3).

2.1 Governance Discipline

Did the operator follow its own declared governance controls?

InputSymbolDirectionv0.1.0 implemented?
HITL bypass rate, last 30dhitl_bypass_rate_30dLower is betterpartial — proxy for HITL volume only
Policy evaluation coverage, last 30dpolicy_eval_coverage_30dHigher is betterno
Evidence chain continuity (intact prev_hash linkage)chain_continuityHigher is betterno — assumed 1.0
HITL response p95 latency, secondshitl_response_p95s_30dLower is betterno
Active policy version age, dayspolicy_version_age_daysLower is betterno
Proxy uptime, last 30d (fraction of expected sync windows)proxy_uptime_30dHigher is betterno

v0.1.0 stub formula (implemented in internal/api/standing.go):

hitl_rate = HumanRequiredEvents / TotalEvents

if hitl_rate < 0.001:
    governance_discipline = 70   # suspiciously low — HITL likely not configured
elif hitl_rate > 0.5:
    governance_discipline = 60   # suspiciously high — policy likely misconfigured
else:
    governance_discipline = 95

v1.0 target formula:

governance_discipline = clamp(0, 100,
    100
  - 50 * hitl_bypass_rate_30d                   # bypass is the worst signal
  - 30 * (1 - policy_eval_coverage_30d)
  - 20 * (1 - chain_continuity)
  -  5 * sigmoid((hitl_response_p95s_30d - 600) / 600)   # > 10 min response
  -  5 * sigmoid((policy_version_age_days - 90) / 90)
  - 10 * (1 - proxy_uptime_30d)
)

2.2 Scope Adherence

Did the agent's actual behavior match the operator's declared scope manifest?

InputSymbolDirectionv0.1.0 implemented?
Scope drift events, last 30d (calls outside declared manifest)scope_drift_count_30dLower is betterno — proxy is block_rate
State-transition anomaly count, last 30d (privilege-escalation sequences)state_anomaly_count_30dLower is betterno
Operator-declared scope manifest age, daysmanifest_age_daysLower is betterno
Block rate, last 30d (fraction of calls blocked by policy)block_rate_30dLower is better (above zero)yes
Unauthorized credential use count, last 30dunauth_credential_count_30dLower is betterno
Tool diversity outside declared manifest, last 30dtools_outside_manifest_30dLower is betterno

v0.1.0 stub formula:

block_rate = BlockedEvents / TotalEvents

if block_rate > 0.30:
    scope_adherence = 60
elif block_rate > 0.10:
    scope_adherence = 80
else:
    scope_adherence = 90

v1.0 target formula:

scope_adherence = clamp(0, 100,
    100
  - 15 * scope_drift_count_30d
  - 25 * state_anomaly_count_30d                 # privilege escalation is worst
  -  5 * sigmoid((manifest_age_days - 180) / 90) # stale manifests are suspicious
  - 30 * sigmoid((block_rate_30d - 0.20) / 0.10) # high blocks suggest persistent scope drift attempts
  - 20 * unauth_credential_count_30d
  - 10 * tools_outside_manifest_30d
)

2.3 Anomaly Load

Statistical anomalies in agent behavior that don't fit the agent's own historical baseline.

InputSymbolDirectionv0.1.0 implemented?
Decision volume, last 30d (used for normalization, NOT as a reward)volume_30dNeutral (denominator only)yes
Volume z-score vs trailing-90d baselinevolume_z90Lower is better (above 2σ)no
Tool diversity z-score vs trailing-90d baselinetool_div_z90Lower is betterno
Time-of-day anomaly count, last 30dtod_anomaly_count_30dLower is betterno
Lifetime incident count, decay-adjustedincidents_lifetime_decayedLower is betterno
Upstream latency p95 anomaly count, last 30dlatency_anomaly_count_30dLower is betterno

v0.1.0 stub formula:

anomaly_load = 90   # constant baseline — no anomaly detection in v0.1.0

v1.0 target formula:

anomaly_load = clamp(0, 100,
    100
  - 25 * sigmoid((volume_z90 - 2) / 1)
  - 15 * sigmoid((tool_div_z90 - 2) / 1)
  - 10 * sigmoid((tod_anomaly_count_30d - 5) / 5)
  - 30 * sigmoid((incidents_lifetime_decayed - 3) / 2)
  - 10 * sigmoid((latency_anomaly_count_30d - 5) / 5)
)

The volume_30d input is included as a denominator (anomalies are normalized per-volume) but never as a positive contributor. This deliberately blocks the "hide your usage to look clean" gaming strategy (see §8).


3. Overall score computation

The Estoppl Score is a fixed-weight linear combination of the three subscores, scaled to 0-1000.

overall_score = round(
    governance_discipline * 0.35 +
    scope_adherence       * 0.35 +
    anomaly_load          * 0.30
) * 10

Weights are immutable per methodology version. Changes require a version bump and 60-day insurer notice (§6).

Score bands (rendered in the certificate's score_band field):

RangeBandRecommended downstream action
800-1000low_riskStandard processing
500-799medium_riskHeightened review; consider additional controls
0-499high_riskBlock or escalate
Any (TotalEvents == 0)no_historyConservative defaults; do not assume low_risk

no_history is structurally distinct from high_risk. Both produce conservative downstream defaults, but for opposite reasons (insufficient data vs. evidence of problems). Insurance carriers should treat them differently in pricing.


4. Anti-Sybil identity propagation

A naive scoring system creates a "rotate the agent identity to reset the score" gaming opportunity. We block this with operator-level propagation.

4.1 Identity model

operator_id  ──┬── agent_id_1 (current)
               ├── agent_id_2 (current)
               └── agent_id_3 (retired, but score history retained)

Every agent registers under an operator_id (an Estoppl-issued UUID derived from the operator's verified business identity at signup). The operator_id is persistent and cannot be self-rotated.

4.2 Penalty propagation

Adverse events on any agent under an operator propagate to the operator-level reputation:

operator_penalty_score = max(
    individual_agent_penalties,
    sum(individual_agent_penalties) * 0.4
)

The first term ensures a single bad agent's penalty fully applies. The second term ensures multiple bad agents under one operator compound (40% of their sum, to avoid double-counting tightly-correlated incidents).

4.3 What this means in practice

  • A new agent registered under a clean operator inherits the operator's full reputation (no zero-history penalty).
  • A new agent registered under an operator with a recent incident inherits the propagated penalty until decay (§5) reduces it.
  • A new operator (no prior identity) receives the no_history band — not low_risk. They have to earn the score, not get it for free.

4.4 Death certificate event

The single most punitive event is self-report falsification — the operator reports action A, Estoppl-attested telemetry shows action B. This:

  • Triggers a hard score = 0 for the originating agent for 30 days.
  • Sets operator_penalty_score += 50 permanently capped to a 10-year decay window with a 3-year half-life (slower than other events; see §5).
  • Flags the operator's record with a falsification_event_count that surfaces in the evidence pack indefinitely (until decayed below 1.0).

There is no "permanent ban" — but the recovery cost is high enough that the cheaper rational response is honest disclosure of the underlying issue. This mirrors how human credit bureaus handle confirmed fraud: severely punitive, but not eternal.


5. Decay rules

All score-affecting events decay over time.

5.1 Standard decay

For a generic adverse event with raw weight w₀:

w(t) = w₀ * exp(-ln(2) * t / 365)        # 1-year half-life, days

After ~10 years (~3650 days), w(t) ≈ w₀ / 1024 — effectively zero. Events older than 10 years are dropped from the input set entirely (computational simplification, not a methodology change).

5.2 Falsification decay (slower)

For self-report falsification events (§4.4):

w(t) = w₀ * exp(-ln(2) * t / (3 * 365))   # 3-year half-life

Cap at 10 years like all other events.

5.3 No event is permanent

Maximum decay window is 10 years for every event type, including falsification. This is a deliberate design choice to:

  • Avoid Sybil-incentive cliff (where the rational move is to spin up a new operator after some time anyway)
  • Match human credit bureau practice (Chapter 7 bankruptcy falls off in 7-10 years)
  • Allow operators to genuinely improve over time

6. Versioning and update cadence

6.1 Versioning

Methodology versions follow vMAJOR.MINOR.PATCH:

  • PATCH (v0.1.0 → v0.1.1): bug fixes, no weight changes, no input additions or removals. No notice required.
  • MINOR (v0.1.x → v0.2.0): weight adjustments within existing inputs OR addition of new inputs (additive only). 60-day notice to insurer integrators.
  • MAJOR (v0.x → v1.0): structural change (subscore restructuring, removal of inputs, formula change). 90-day notice + parallel-run period (old + new versions both queryable).

6.2 Insurer change-management

Active insurer integrations are notified via:

  1. Email to the integration's registered contact
  2. Banner in the queryable certificate response (methodology_change_notice field, set 60+ days before activation)
  3. Deprecation header in the API response (Deprecation: <RFC9745 date>)

The previous methodology version remains queryable via ?methodology_version=v0.1.0 for 12 months after a new MINOR ships, and 24 months after a MAJOR.

6.3 Annual re-weighting

Beginning v1.0, weights are re-evaluated annually based on the prior 12 months of corpus data and observed incident outcomes. The re-evaluation produces a published validation backtest report.


7. How to verify / integrate

7.1 CISO independent verification

A CISO with a Standing Certificate JSON can verify the score is internally consistent without trusting Estoppl's cloud:

  1. Fetch the public key for public_key_id from https://api.estoppl.ai/.well-known/jwks.json (TODO STD.4).
  2. Verify the Ed25519 signature over the canonical JSON of all certificate fields except signature (TODO STD.4).
  3. Walk the evidence chain (linked from evidence_url) using estoppl verify-certificate (TODO STD.4).
  4. Re-compute the subscores from the evidence drill-down (TODO TRY.1) by applying the formulas in §2 to the published inputs.

If steps 1-3 succeed and step 4 produces the same subscore values as in the certificate, the score is internally consistent.

7.2 Insurer integration

Insurance carriers integrate the score as an underwriting input by:

  1. Querying GET /v1/standing/{deployer_id} at quote time and renewal.
  2. Mapping score_band to the carrier's pricing tiers.
  3. Optionally drilling into subscores to apply carrier-specific weights (e.g., a carrier may weight governance_discipline more heavily than the published 35%).
  4. Optionally querying GET /v1/standing/{deployer_id}/evidence (TODO TRY.1) for incident-level drill-down used in claims-handling.

The published subscore values are the carrier's contractual signal; raw inputs are advisory. A carrier may not apply a re-weighting that contradicts the published methodology without renegotiating their data-feed contract.


8. Anti-patterns we deliberately avoid

We have explicitly structured the methodology to prevent the following gaming strategies:

Anti-patternHow we block it
"Hide your usage to look clean" — agent reduces tool calls to lower the chance of incidentsVolume is a denominator, never a positive contributor. Subscores are rates and z-scores, not raw counts.
"Rotate identities to reset the score" — operator spins up a new agent_id after a bad incidentOperator-level propagation (§4). New agent inherits operator penalty.
"New operator gets a free perfect score" — fresh operator registers, gets low_risk immediatelyNew operators get no_history band, not low_risk. Insurance carriers treat the two distinctly.
"Old agent in production with many incidents outscores young clean agent"No formula contains a time_in_production divisor or a 1/incidents_per_year term. Decay (§5) reduces old penalties, but never rewards age in absolute terms.
"Self-report falsification to hide bad behavior"Death certificate event (§4.4): hard zero for 30d + slow-decay operator penalty + permanent flag in evidence pack until decayed below 1.0.
"Game the policy threshold" — operator sets policy thresholds artificially low so blocks never trigger, looking compliant on papermanifest_age_days and policy_eval_coverage (v1.0 inputs) penalize stale or skipped policy evaluations. CISO drill-down surfaces thresholds.

9. v0.1.0 stub vs v1.0 target — honest gap analysis

The v0.1.0 implementation in internal/api/standing.go is a publishable stub. It computes a real score from real telemetry, but uses a small subset of the v1.0 input set:

Subscorev0.1.0 inputs (count)v1.0 target inputs (count)Confidence
Governance Discipline1 (HITL rate proxy)6Low — directionally correct, magnitudes uncalibrated
Scope Adherence1 (block rate)6Low — same
Anomaly Load0 (constant 90)6None — placeholder until anomaly detection ships

Decay rules (§5) and identity propagation (§4) are specification-only in v0.1.0 — the runtime does not yet apply them. They are documented now so insurer integrators can plan for them; the implementation roadmap is in §10.

What this means for early users:

  • Insurance carriers in pre-revenue research mode can review the methodology and design-partner the integration. They should NOT use v0.1.0 scores as a contractual underwriting input.
  • Deployers and their customers (CISOs) can use v0.1.0 scores as a directional signal in security review. The methodology_version field in the certificate is honest about the maturity.
  • The aarm_conformance field is aligned_extended_review_pending in v0.1.0 — formal AARM Extended conformance review (CSA) is in flight (TODO).

10. Roadmap

VersionTargetMajor changes
v0.1.xNOWStub. Three subscores, simplified formulas, seeded weights.
v0.2.xNEXT (months 3-6)Add scope_drift_count, state_anomaly_count, manifest_age_days. Implement chain_continuity input. Publish first validation backtest against the four major 2026 incidents (Meta, McKinsey Lilli, Mercor/LiteLLM, Step Finance).
v0.3.xTHEN (months 6-9)Implement operator-level identity propagation (§4). Implement decay rules (§5).
v1.0LATER (months 9-15)Re-weight all subscores against accumulated 6-12 months of corpus data. Annual revision cycle begins. Vertical-specific subscore variants (FS / healthcare / federal) ship as v1.0+x.

Appendix A: Field reference

Every input symbol used in this methodology, with its data source.

SymbolTypeSourceAggregation window
hitl_bypass_rate_30dfloat [0, 1]events table where policy_decision='HUMAN_REQUIRED' AND review status='timeout_proceed'30 days
hitl_rate_30d (v0.1.0 proxy)float [0, 1]events table where policy_decision='HUMAN_REQUIRED' / total30 days
policy_eval_coverage_30dfloat [0, 1]events table where policy_decision IS NOT NULL / total ingested events30 days
chain_continuityfloat [0, 1]computed via internal/chain.WalkSegmentlast sync
hitl_response_p95s_30dint secondsreviews table, decided_at - requested_at p9530 days
policy_version_age_daysint dayspolicies table, now() - max(activated_at)snapshot
proxy_uptime_30dfloat [0, 1]events table, fraction of expected 5-min sync windows present30 days
scope_drift_count_30dintevents table where tool_name NOT IN declared manifest30 days
state_anomaly_count_30dintcomputed from state-transition graph (NEXT)30 days
manifest_age_daysint daysmanifests table, now() - max(updated_at)snapshot
block_rate_30dfloat [0, 1]events table where policy_decision='BLOCK' / total30 days
unauth_credential_count_30dintevents table where authorizing_credential IS NULL AND tool requires credential30 days
tools_outside_manifest_30dintDISTINCT count of tool_name where tool NOT IN declared manifest30 days
volume_30dinttotal event count30 days
volume_z90floatz-score of volume_30d vs trailing 90d mean/std30 + 90 days
tool_div_z90floatz-score of unique tool count vs trailing 90d30 + 90 days
tod_anomaly_count_30dintevents outside operator-declared operating hours30 days
incidents_lifetime_decayedfloatsum of all incident events with §5 decay appliedlifetime
latency_anomaly_count_30dintevents with actual_latency_ms > p99(operator's baseline)30 days

Appendix B: Change log

VersionDateChanges
v0.1.02026-05-10Initial publication. Stub implementation.