Reference

Accountability scoring methodology

Every accountability score on this site is computed by deterministic rules — no learned model. Below is a visual, interactive walk-through of those rules; the precise specification follows it.

The four dimensions

Four separate scores — never blended into one number

A government can be diligent but undelivering, or fund things well without fulfilling its mandates. Squashing four very different signals into one “score out of 10” hides exactly the information that matters.

33%

Delivery

Are promises being kept?

Scored on: Government · party · body · officeholder

  • Government has 66 live Programme-for-Government commitments.
  • Coverage: 100% of those are backed by evidence.
  • Each commitment is weighted by significance (flagship × 3, standard × 1, minor × 0.5).

Right now: 33% of government commitments delivered (weighted)

52%

Diligence

Do officeholders show up to vote?

Scored on: Every officeholder with in-term divisions

  • 254 officeholders scored.
  • A vote is “participated in” if a ta / nil / staon vote was recorded.
  • Officeholders are never penalised for divisions outside their term.

Right now: 52% average participation across all officeholders

31%

Mandate fulfilment

Are bodies fulfilling their statutory mandates?

Scored on: Every body with mandate-linked commitments

  • 15 bodies scored.
  • Only counts commitments tied (via mandateId) to a statutory mandate.
  • Uses the same status × significance weighting as Delivery.

Right now: 31% average fulfilment across scored bodies

n/a

Fiscal stewardship

Does spend track allocation?

Scored on: Every body with at least one budget vote

  • 19 bodies have budget votes.
  • Linear penalty: 0% off allocation → 1.0, 20% or more off → 0.0.
  • Outturn is rarely published yet, so coverage is 0 today — an honest “not yet verifiable”, not a zero verdict.

Right now: n/a actual-vs-allocated stewardship

Delivery dimension

How a delivery rate is built up from individual promises

Three ingredients: a status for each commitment, a significance weight, and a time-awareness rule that keeps promises-not-yet-due out of the failure column.

Step 1 · Status maps to a value 0–1

delivered
1.0
partially-delivered
0.5
in-progress
0.3
stalled
0.1
broken
0.0
abandoned
0.0
promised
excluded — see on-track rate
  • deliveredPromise kept in full.
  • partially-deliveredSome but not all of the promise has shipped.
  • in-progressActive work demonstrably under way.
  • stalledStarted, then went quiet.
  • brokenReversed or contradicted.
  • abandonedDropped from the plan.
  • promisedMade but not yet due → excluded from the delivery rate. Counts toward onTrackRate instead.

Step 2 · Significance sets the weight

flagship
× 3
standard
× 1
minor
× 0.5

Flagship promises move the dial three times as much as a standard one. Missing a minor tweak hurts much less than missing a flagship.

Try it

Interactive delivery calculator

Adjust the statuses below to see how the delivery rate moves. The maths is weighted: value × significance, divided by the total significance weight.

CommitmentStatusSignificanceValueWeightContribution
0.3× 30.90
0.5× 31.50
1.0× 11.00
0.1× 0.50.05
excluded

4 counted commitments · 1 promised (excluded)

46%

Calculated delivery rate

deliveryRate = 3.45 ÷ 7.50 = 0.460

Counted commitments are all non-promised entries. Promised ones (still in their grace period) drop into the separate on-track rate: 100%.

Diligence dimension

Did this officeholder turn up to vote?

Every recorded division during an officeholder's term either has a vote row (they participated) or doesn't (they were absent). Term scoping means nobody is punished for votes before they took office or after they left.

Worked example

Six divisions in one term

Term startTodaytaabsentstaonniltaabsent
ta / nil / staon = participation (value 1)no vote row = absent (value 0)

participationRate = 4 ÷ 6 = 0.67 (67%)

Divisions held before they took office or after they left never count. Out-of-term votes are simply not in the denominator.

Every officeholder page on the accountability ledger lists their full division-by-division record with each individual vote linked.

Mandate-fulfilment dimension

Is each body delivering on its statutory mandates?

Uses the same status × significance maths as Delivery, but only on commitments tied to a body's statutory mandates — a much narrower lens.

How the join works

Commitment → mandate → body

commitment 1commitment 2commitment 3Commitmentsmandate 1mandate 2MandatesBodyBodyvia mandateIdvia bodyId

Only commitments whose mandateId resolves to a real mandate are included. The score belongs to the mandate's body — which can differ from the commitment's directly-responsible body.

Browse any body page on the accountability ledger to walk this join with real records.

Fiscal-stewardship dimension

Does spend track the budget it was allocated?

Once a fiscal year closes, every budget vote can be compared to its outturn. The further the spend strays from allocation — either way — the lower the stewardship value.

Try it

Variance → stewardship value

Drag the slider to see how a single vote scores against its allocation. The curve is linear from 1.0 (perfect) down to 0.0 at 20% off, in either direction.

0.00.51.00%10%20%30%variance from allocationstewardship value
Direction

outturn = €535m

value = max(0, 1 − 0.07 ÷ 0.20) = 0.65

The 20% tolerance band is the single tunable parameter of this dimension. Either side of perfect — overspending or underspending — loses points equally.

Coverage

Every rate ships with its data completeness

A high rate built on thin data is a much weaker signal than the same rate built on complete data. The site never blends those together.

Why coverage is reported alongside every rate

Same headline number, very different confidence

Body A — high coverage

78%

Rate (delivery)

78%

Coverage92%

78% delivery, 92% of commitments evidence-backed. Trustworthy.

Body B — low coverage

78%

Rate (delivery)

78%

Coverage15%

78% delivery, but only 15% evidence-backed. The rate isn't wrong — it's just standing on very thin data.

Coverage is never folded into the rate — it sits beside it so a confident-looking number on thin data is always visible. A score with 0% coverage is rendered as not yet verifiable rather than as a zero verdict.

Snapshot as of 2026-05-23 · methodology version delivery-2.0.0

Full methodology reference (source markdown)

The source of this document is data/accountability/scoring-methodology.md. The interactive section above renders the same rules visually.

Accountability scoring methodology

Methodology version: `accountability-2.0.0` (dataset envelope). Per-dimension stamps: `delivery-2.0.0`, others at accountability-2.0.0.

This document describes the accountability scoring dimensions produced by scripts/compute-scores.mjs and written to data/accountability/scores.json.

Every score carries its own methodologyVersion. The Delivery dimension is versioned independently of the dataset envelope: a bump to Delivery does not silently restamp scores from other dimensions, and historical scores keep the methodology stamp that produced them.

There are four dimensions: Delivery, Diligence, Mandate-fulfilment and Fiscal-stewardship. Each is computed and reported on its own. They are never blended into a single composite number.

No black box, fully decomposable

Every scorer is a deterministic rules function, not a learned model. The same inputs always produce the same outputs, and the output order is stable. Every AccountabilityScore carries a breakdown[] array — one row per underlying record — so any rate can be expanded back into the exact records, weights and values that produced it. No record-level data is hidden inside an aggregate.

Every score also carries a `coverage` figure — data completeness — and it is never omitted. Coverage is reported independently of the headline rate so a high rate built on thin data is always visible.


Delivery dimension

Weighted commitment delivery for an entity. Methodology version `delivery-2.0.0`. The pure delivery-1.0.0 rate is preserved alongside the new outcome-aware rate; the gap between them is the absorbed-reform signal.

Status → delivery value

Each commitment's CommitmentStatus maps to a delivery value in [0,1]:

StatusValue
delivered1.0
partially-delivered0.5
in-progress0.3
stalled0.1
broken0.0
abandoned0.0
promisedexcluded — see Time-awareness

Significance weighting

Each commitment is weighted by its significance field:

SignificanceWeight
flagship3
standard1
minor0.5

If significance is absent, the commitment defaults to `standard` (weight 1).

deliveryRate is the weight-weighted mean of the delivery values of all counted (due) commitments:

deliveryRate = Σ(value · weight) / Σ(weight)   over counted commitments

Time-awareness

A promised commitment that is not yet due is not a failure. It is excluded from deliveryRate (its ScoreContribution has counted: false) and counted instead toward onTrackRate:

onTrackRate = (promised commitments still on track) / (all promised commitments)

A promised commitment is "on track" unless its expectedDeliveryBy date has already passed relative to the score's asOf date. With no expectedDeliveryBy it is treated as on track.

Delivery coverage

coverage = (commitments backed by ≥1 high/medium-confidence evidence source)
           / (all of the entity's commitments)

Evidence sources with confidence of low (or absent) do not count.

Derived fields: numeric target and deadline

hasNumericTarget and hasDeadline are used for downstream analysis. When a commitment file does not set them they are derived in-memory from the commitment text (the committed JSON is not modified):

  • `hasNumericTarget` — true when the title/description contains a number

followed by a unit keyword (homes, units, beds, jobs, MW, staff, …), a percentage, a monetary figure (/$/£), or a bare 3+ digit number.

  • `hasDeadline` — true when expectedDeliveryBy is set, or the text mentions

a target year/quarter (by 2030, by the end of 2027, Q3, mid-2026) or a relative window (within 5 years).

These are simple, transparent heuristics; they do not feed the delivery rate.

Delivery entities scored

One AccountabilityScore (dimension: "delivery") is emitted per:

  • `government` — all Programme for Government commitments, collectively.
  • `body` — grouped by each PfG commitment's responsibleBodyId.
  • `officeholder` — grouped by each PfG commitment's responsibleOfficeholderId.
  • `party` — grouped by each GE2024 manifesto commitment's partyId.

Delivery 2.0.0 — outcome-aware adjustment

The Delivery dimension exposes two rates side by side:

  • deliveryRate — pure delivery-1.0.0 value (no outcome adjustment).

Preserved exactly so existing readers stay stable and historical comparison is apples-to-apples.

  • outcomeAdjustedRatedelivery-2.0.0 value with the rule below applied.

The gap between the two rates is the absorbed-reform signal. If a government delivers a lot of delivered commitments whose substantive outcomeStatus is outcome-unchanged, the two rates diverge: deliveryRate rewards the paperwork, outcomeAdjustedRate discounts it. This is the "delivered vs delivered-and-worked" distinction from [docs/power-and-blockers.md](../../docs/power-and-blockers.md) section "Absorbed reform — outcome status separate from delivery status".

When the adjustment fires

Only on rows whose CommitmentStatus is delivered or partially-delivered. Every other status is untouched (the adjustment is a no-op for promised, in-progress, stalled, broken, abandoned).

Status × outcomeStatus → value table

Delivery statusoutcomeStatusValue applied to rowNote on row
deliveredundefined / not-applicable1.0 (unchanged)none
deliveredoutcome-improved1.0 (unchanged)none
deliveredoutcome-unchanged0.5 (half value)"outcome-unchanged: value halved per delivery-2.0.0 …"
deliveredoutcome-worsened0.0 (zero)"outcome-worsened: value zeroed per delivery-2.0.0 …"
deliveredcontested0.5 (half value)"contested: value halved per delivery-2.0.0 …"
partially-deliveredundefined / not-applicable0.5 (unchanged)none
partially-deliveredoutcome-improved0.5 (unchanged)none
partially-deliveredoutcome-unchanged0.25 (half of 0.5)"outcome-unchanged: value halved per delivery-2.0.0 …"
partially-deliveredoutcome-worsened0.0"outcome-worsened: value zeroed per delivery-2.0.0 …"
partially-deliveredcontested0.25"contested: value halved per delivery-2.0.0 …"
every other status(any)unchanged (delivery-1.0.0 mapping)none

Why undefined is treated as "no adjustment", not as outcome-unchanged

Absence of outcome data is not absence of outcome. We do not assume the world stood still simply because nobody has yet sourced the outcome metric. Penalising unfilled outcome fields would incentivise leaving them blank, and would conflate "we have not measured" with "we measured no change".

Instead, the absence is surfaced honestly through coverage (the share of commitments backed by ≥1 high/medium-confidence source). The Delivery dimension keeps the same coverage definition; a separate outcome-coverage signal can be layered in additively later without changing the rates above.

not-applicable is treated the same as undefined for the same reason: an administrative commitment with no measurable outcome should not be punished for the lack of one.

Relationship between deliveryRate and outcomeAdjustedRate

deliveryRate         = Σ(pureValue   · weight) / Σ(weight)   over counted commitments
outcomeAdjustedRate  = Σ(adjustedVal · weight) / Σ(weight)   over counted commitments

Both rates use the same counted set (time-aware promised exclusion is identical) and the same significance weights. The only difference is the per-row value: outcomeAdjustedRate uses the table above, deliveryRate uses the pure delivery-1.0.0 mapping.

When no commitment in scope carries an actionable outcomeStatus (everything is undefined or not-applicable), the two rates are identical by construction. The new field activates as commitments are tagged in future PRs.

Decomposability

Every breakdown row reflects the outcome-adjusted (delivery-2.0.0) value, so outcomeAdjustedRate decomposes back to the exact records that produced it. When the adjustment fires, the row records:

  • outcomeStatusApplied — the outcomeStatus value the scorer read.
  • note — a short human-readable explanation of what changed and why

(e.g. "outcome-unchanged: value halved per delivery-2.0.0 (absorbed-reform haircut)").

If a row has no outcomeStatusApplied, no adjustment was considered (the commitment had no outcomeStatus). If a row has outcomeStatusApplied but no adjustment note, the outcome status existed but did not change the value (e.g. outcome-improved or not-applicable).

The pure deliveryRate field is computed in a parallel pass over the same source data and is not decomposed in breakdown[]; the breakdown is the audit trail for the new headline (outcomeAdjustedRate). To reproduce deliveryRate from the breakdown, replace each row's adjusted value with the unadjusted mapping for its status (e.g. delivered → 1.0 regardless of outcomeStatusApplied) and re-weight.


Diligence dimension

Per officeholder parliamentary participation. Measures how reliably an officeholder turns up to recorded votes (divisions).

Inputs

divisions.json (every recorded division) joined to member-votes.json (one row per officeholder per division they were present for). Absence is represented by the lack of a member-vote row, not by an explicit record.

Term scoping

A division counts toward an officeholder only if its date falls within one of the officeholder's terms (from/to window, an open to meaning still in office). An officeholder is never penalised for divisions held before they took office or after they left.

Participation value

For each in-term division, the officeholder's vote is one of ta, nil, staon or absent. A vote of ta/nil/staon is participation (value 1); absent (no member-vote row) is value 0.

participationRate = (divisions voted in: ta/nil/staon)
                     / (divisions held during the officeholder's term)

Every in-term division is counted; the score decomposes by division in breakdown[] (a DivisionContribution per division).

Diligence coverage

coverage = (term divisions with a member-vote row of any kind)
           / (all term divisions)

With the current data every member-vote row is a present vote, so coverage equals participationRate; the field is kept distinct so that if upstream ever records explicit absent rows, coverage and participation diverge correctly.

Diligence entities scored

One score (dimension: "diligence") per officeholder who had at least one in-term division. Officeholders whose terms cover no division are not scored.


Mandate-fulfilment dimension

Per body: how well the commitments linked to that body's statutory mandates are being delivered.

Inputs and join

commitments.jsonmandateIdmandates.jsonbodyId. Only PfG commitments that carry a non-null mandateId resolving to a known mandate are included. The body of the score is the mandate's bodyId, which may differ from the commitment's responsibleBodyId.

Fulfilment value

fulfilmentRate reuses the Delivery status→value mapping and significance weights exactly (see Delivery above), applied only to the body's mandate-linked commitments:

fulfilmentRate = Σ(value · weight) / Σ(weight)   over counted mandate-linked commitments

promised commitments are excluded from the rate (counted: false), consistent with the Delivery dimension.

Mandate-fulfilment coverage

coverage = (mandate-linked commitments backed by ≥1 high/medium-confidence source)
           / (all of the body's mandate-linked commitments)

mandateCount reports how many distinct mandates of the body have at least one linked commitment behind the score.

Mandate-fulfilment entities scored

One score (dimension: "mandate-fulfilment") per body that has at least one mandate-linked commitment.


Fiscal-stewardship dimension

Per body: how closely actual spend tracks the budget that was allocated.

Inputs

spending.json budget votes ({ votes, programmes, subheads }). Each BudgetVote carries a grossAllocation (always present) and, once a fiscal year closes, an outturn (actual spend — usually absent). Scores decompose by budget vote (BudgetContribution per vote).

Variance → stewardship value

For a vote with a published outturn:

variance = |grossAllocation − outturn| / grossAllocation
value    = max(0, 1 − variance / 0.20)

A vote spent exactly to allocation scores 1.0; a vote 20% or more off allocation (over or under) scores 0.0; in between the value falls linearly. The 20% tolerance band is the single tunable parameter of this dimension.

stewardshipRate is the allocation-weighted mean of the values of votes that have an outturn, so larger votes dominate the body's score:

stewardshipRate = Σ(value · allocation) / Σ(allocation)   over votes with an outturn

Honest handling of missing outturn

Most votes have no `outturn` yet. Outturn is never fabricated. A vote with no published outturn is counted: false, contributes value: 0, carries variance: null / outturn: null, and is excluded from stewardshipRate. Instead it lowers coverage, so a body whose budget is mostly unverifiable shows a low coverage rather than a misleadingly confident rate.

Fiscal-stewardship coverage

Coverage is the allocation share of the body's budget that is backed by a published outturn:

coverage = Σ(allocation of votes with an outturn) / Σ(allocation of all votes)

outturnVoteCount reports how many of the body's votes have an outturn. With the current data no outturn is published, so every fiscal-stewardship score has coverage: 0, stewardshipRate: 0 and outturnVoteCount: 0 — an honest "not yet verifiable" signal, not a zero-performance verdict.

Fiscal-stewardship entities scored

One score (dimension: "fiscal-stewardship") per body that has at least one budget vote in spending.json.


Score shape

AccountabilityScore is a discriminated union on dimension. Every member shares a common envelope (entityType, entityId, coverage, methodologyVersion, asOf); each dimension then adds its own metric fields and its own typed breakdown[]:

  • deliverydeliveryRate, onTrackRate, commitmentCount, ScoreContribution[]
  • diligenceparticipationRate, divisionCount, votedCount, DivisionContribution[]
  • mandate-fulfilmentfulfilmentRate, mandateCount, commitmentCount, ScoreContribution[]
  • fiscal-stewardshipstewardshipRate, voteCount, outturnVoteCount, BudgetContribution[]

The delivery member is byte-compatible with methodology version delivery-1.0.0, so existing Delivery scores and their consumers are unaffected.

Versioning

methodologyVersion (accountability-2.0.0) is stamped on the dataset and on every score. Bump it whenever any dimension's status/variance/participation mapping, weights, scoping rule or coverage definition changes, so historical scores remain interpretable against the rules that produced them.


Summaries layer

Methodology version: summaries-1.0.0 (independent of the scoring methodology above). Lives in data/accountability/summaries.json, regenerated by pnpm data:summaries. Read at build time and embedded into every prerendered entity page.

What a summary is

A small set of plain-language bullet points (3 to 7) that describe one entity plus three structured impact analyses:

  • Direct impact: one-hop relationships derived from the entity's own

foreign keys (commitment.responsibleBodyId, officeholder terms[].bodyId, edges on the typed-edge layer, etc.).

  • Indirect impact: two-hop traversal from each direct target, deduped and

ranked, capped at ten entries to keep the panel readable.

  • Leverage points: cross-references into the systems layer, surfacing the

Meadows leverage level for every system step the entity participates in.

How bullets are produced

Two backends, depending on entity kind:

  • Officeholders (1202 records): template-based. Bullets are assembled

deterministically from structured fields (current role, party, level, constituency, civil-service grade, term count, Diligence score, owned commitments). No model is involved.

  • Bills, parties, bodies, mandates, commitments, divisions: generated from

the source records by a language model run locally during the build (a free-tier OpenRouter model, default google/gemma-4-31b-it:free). This pass runs once on a contributor's machine; the resulting JSON is committed and Vercel deployments read the static file. There is no per-visit model call.

Sourcing rule (anti-invention)

Every bullet must cite at least one Source whose URL appears verbatim on the entity's underlying records. The generator validates this on the way out and the data validator (pnpm data:validate) rechecks it on the way in. A bullet that cites a URL outside the entity's record set fails validation and the dataset is rejected. There is no path by which a fabricated citation can land in production.

Caching

The generator hashes each entity's record plus its direct-impact set. On a re-run, an entity whose hash is unchanged and whose previous summary is at the current methodology version is reused verbatim from the existing summaries.json. Changes to the records trigger a regeneration.

Source of truth

Summaries are derivative. The underlying records are the ground truth. A summary may be incomplete or out of date relative to the records it cites; in all cases, follow the [source] links on each bullet to the primary documents. If a summary contradicts the records, the records win.