feat: ingest official DfE KS2 national averages from EES data catalogue
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 19s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 53s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m24s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s

Replaces computed means from our school dataset with the published DfE
national headline figures for the KS2 chart reference line.

- tap-uk-ees: new EESKs2NationalStream fetches the stable EES data-catalogue
  CSV (one row per year, England national total, AllSchools filter)
- dbt staging: stg_ees_ks2_national normalises columns, casts to float,
  filters to years >= 201617
- dbt mart: fact_ks2_national_averages — one row per year, official figures
- backend/models: Ks2NationalAverage SQLAlchemy model
- backend/app: /api/national-averages queries the mart for KS2 by_year;
  secondary by_year stays computed (no DfE KS4 national dataset yet)
- DAG: extract_ks2_national task added to school_data_annual_ees,
  runs in parallel with the main EES extract

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Tudor Sitaru
2026-04-09 14:40:33 +01:00
parent a3cfffa4d0
commit dc66e22d4d
8 changed files with 236 additions and 12 deletions
@@ -111,6 +111,12 @@ models:
- name: urn
tests: [not_null]
- name: fact_ks2_national_averages
description: Official DfE KS2 national headline averages — one row per academic year
columns:
- name: year
tests: [not_null, unique]
- name: fact_deprivation
description: IDACI deprivation index — one row per URN
columns:
@@ -0,0 +1,25 @@
{{ config(materialized='table') }}
-- Mart: Official DfE KS2 national headline averages — one row per academic year.
-- These are the published England-wide figures, not computed means from our school dataset.
-- Used by the /api/national-averages endpoint to provide accurate per-year reference lines
-- on the school history chart and for hero stat comparisons.
select
year,
rwm_expected_pct,
rwm_high_pct,
reading_expected_pct,
reading_high_pct,
reading_avg_score,
writing_expected_pct,
writing_gd_pct,
maths_expected_pct,
maths_high_pct,
maths_avg_score,
gps_expected_pct,
gps_high_pct,
gps_avg_score,
science_expected_pct
from {{ ref('stg_ees_ks2_national') }}
order by year
@@ -45,6 +45,9 @@ sources:
- name: ees_admissions
description: Primary and secondary school admissions data
- name: ees_ks2_national
description: KS2 national headline averages from DfE EES data catalogue — one row per academic year
# Phonics: no school-level data on EES (only national/LA level)
- name: parent_view
@@ -0,0 +1,34 @@
{{ config(materialized='table') }}
-- Staging model: DfE KS2 national headline averages
-- Source: EES data catalogue CSV (one row per academic year, England national total)
-- COVID years 2019/20 and 2020/21 are naturally absent — DfE did not publish figures
-- because national assessments were cancelled. Those years produce no rows here.
-- 'x' (not applicable) and suppressed values are coerced to NULL by safe_numeric.
select
cast(trim(time_period) as integer) as year,
{{ safe_numeric('rwm_expected_pct') }} as rwm_expected_pct,
{{ safe_numeric('rwm_high_pct') }} as rwm_high_pct,
{{ safe_numeric('reading_expected_pct') }} as reading_expected_pct,
{{ safe_numeric('reading_high_pct') }} as reading_high_pct,
{{ safe_numeric('reading_avg_score') }} as reading_avg_score,
{{ safe_numeric('writing_expected_pct') }} as writing_expected_pct,
{{ safe_numeric('writing_gd_pct') }} as writing_gd_pct,
{{ safe_numeric('maths_expected_pct') }} as maths_expected_pct,
{{ safe_numeric('maths_high_pct') }} as maths_high_pct,
{{ safe_numeric('maths_avg_score') }} as maths_avg_score,
{{ safe_numeric('gps_expected_pct') }} as gps_expected_pct,
{{ safe_numeric('gps_high_pct') }} as gps_high_pct,
{{ safe_numeric('gps_avg_score') }} as gps_avg_score,
{{ safe_numeric('science_expected_pct') }} as science_expected_pct
from {{ source('raw', 'ees_ks2_national') }}
where time_period ~ '^[0-9]+$'
and cast(trim(time_period) as integer) >= 201617