fix(pipeline): restore multi-year KS4 data
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 17s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 46s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m21s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s

Two bugs prevented historical secondary school data from loading:

1. stg_ees_ks4.sql filtered breakdown_topic = 'Total' only, but EES
   releases prior to 2023/24 use breakdown_topic = 'All pupils' (matching
   the KS2 convention). All older years were silently dropped to zero rows.
   Fix: accept both values with an IN clause.

2. get_all_releases() in tap-uk-ees fetched only the first page of the
   EES releases API. Now follows all pages via the paging.totalPages field
   so no historical release is missed when more than 20 exist.

After re-running the annual EES pipeline, secondary school comparison
charts should show data across all available years (2018/19 onwards).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Tudor Sitaru
2026-04-16 09:18:55 +01:00
parent 8154a59014
commit 3401654ab9
2 changed files with 33 additions and 12 deletions
@@ -40,17 +40,36 @@ def _slug_to_time_period(slug: str) -> str | None:
def get_all_releases(publication_slug: str) -> list[dict]: def get_all_releases(publication_slug: str) -> list[dict]:
"""Return all releases for a publication as dicts with 'id' and 'time_period'.""" """Return all releases for a publication as dicts with 'id' and 'time_period'.
url = f"{CONTENT_API_BASE}/publications/{publication_slug}/releases"
resp = requests.get(url, timeout=TIMEOUT) The EES content API paginates with a 'paging' envelope when there are many
resp.raise_for_status() releases. This function follows all pages so no historical release is missed.
data = resp.json() """
# API returns either a plain list or a paginated object with a "results" key
releases = data if isinstance(data, list) else data.get("results", [])
result = [] result = []
for r in releases: page = 1
time_period = _slug_to_time_period(r.get("slug", "")) while True:
result.append({"id": r["id"], "time_period": time_period}) url = f"{CONTENT_API_BASE}/publications/{publication_slug}/releases?page={page}&pageSize=20"
resp = requests.get(url, timeout=TIMEOUT)
resp.raise_for_status()
data = resp.json()
# API returns either a plain list or a paginated object with a "results" key
if isinstance(data, list):
releases = data
total_pages = 1
else:
releases = data.get("results", [])
paging = data.get("paging", {})
total_pages = paging.get("totalPages", 1)
for r in releases:
time_period = _slug_to_time_period(r.get("slug", ""))
result.append({"id": r["id"], "time_period": time_period})
if page >= total_pages:
break
page += 1
return result return result
@@ -3,9 +3,11 @@
-- Staging model: KS4 attainment data from EES -- Staging model: KS4 attainment data from EES
-- KS4 performance data is long-format with breakdown dimensions (breakdown_topic, -- KS4 performance data is long-format with breakdown dimensions (breakdown_topic,
-- breakdown, sex). Unlike KS2 which has a subject dimension, KS4 metrics are -- breakdown, sex). Unlike KS2 which has a subject dimension, KS4 metrics are
-- already in separate columns — we just filter to the 'All pupils' breakdown. -- already in separate columns — we just filter to the all-pupils total row.
-- EES uses 'z' (not applicable) and 'c' (confidential) as suppression codes — -- EES uses 'z' (not applicable) and 'c' (confidential) as suppression codes —
-- safe_numeric handles both by treating any non-numeric string as NULL. -- safe_numeric handles both by treating any non-numeric string as NULL.
-- NOTE: older EES releases (pre-2023/24) use breakdown_topic = 'All pupils';
-- the 2023/24 release switched to breakdown_topic = 'Total'. Both are included.
with performance as ( with performance as (
select * from {{ source('raw', 'ees_ks4_performance') }} select * from {{ source('raw', 'ees_ks4_performance') }}
@@ -46,7 +48,7 @@ all_pupils as (
{{ safe_numeric('gcse_91_percent') }} as gcse_grade_91_pct {{ safe_numeric('gcse_91_percent') }} as gcse_grade_91_pct
from performance from performance
where breakdown_topic = 'Total' where breakdown_topic in ('Total', 'All pupils')
and breakdown = 'Total' and breakdown = 'Total'
and sex = 'Total' and sex = 'Total'
), ),