feat(pipeline): add legacy KS4 backfill (2015/16–2018/19)
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 12s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 52s
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 12s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 52s
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Mirrors the existing legacy KS2 pattern to fill the gap before EES hosted KS4 data. Four files changed: - tap-uk-ees: LegacyKS4Stream downloads each year's DfE Compare School Performance ZIP, extracts england_ks4final.csv, maps 416 legacy columns to Singer fields, strips % suffixes. Registered in discover_streams(). TapUKEES.config_jsonschema gains legacy_ks4_urls setting. - stg_legacy_ks4.sql: safe_numeric casts + NULL placeholders for columns not present in legacy format (ebacc_avg_score, gcse_grade_91_pct, prior_attainment_avg, sen_pct). - int_ks4_with_lineage.sql: adds all_ks4 CTE unioning stg_ees_ks4 and stg_legacy_ks4, matching the int_ks2_with_lineage pattern. - _stg_sources.yml + meltano.yml: source declaration and setting definition for legacy_ks4. URLs configured per-year once provided. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,6 +1,13 @@
|
||||
-- Intermediate model: KS4 data chained across academy conversions
|
||||
-- Unions EES (2023/24 onwards) and legacy (2015/16–2018/19) school-level data
|
||||
|
||||
with current_ks4 as (
|
||||
with all_ks4 as (
|
||||
select * from {{ ref('stg_ees_ks4') }}
|
||||
union all
|
||||
select * from {{ ref('stg_legacy_ks4') }}
|
||||
),
|
||||
|
||||
current_ks4 as (
|
||||
select
|
||||
urn as current_urn,
|
||||
urn as source_urn,
|
||||
@@ -11,8 +18,8 @@ with current_ks4 as (
|
||||
english_maths_strong_pass_pct, english_maths_standard_pass_pct,
|
||||
ebacc_entry_pct, ebacc_strong_pass_pct, ebacc_standard_pass_pct, ebacc_avg_score,
|
||||
gcse_grade_91_pct,
|
||||
sen_pct, sen_ehcp_pct, sen_support_pct
|
||||
from {{ ref('stg_ees_ks4') }}
|
||||
sen_pct, sen_support_pct, sen_ehcp_pct
|
||||
from all_ks4
|
||||
),
|
||||
|
||||
predecessor_ks4 as (
|
||||
@@ -27,12 +34,12 @@ predecessor_ks4 as (
|
||||
ks4.english_maths_strong_pass_pct, ks4.english_maths_standard_pass_pct,
|
||||
ks4.ebacc_entry_pct, ks4.ebacc_strong_pass_pct, ks4.ebacc_standard_pass_pct, ks4.ebacc_avg_score,
|
||||
ks4.gcse_grade_91_pct,
|
||||
ks4.sen_pct, ks4.sen_ehcp_pct, ks4.sen_support_pct
|
||||
from {{ ref('stg_ees_ks4') }} ks4
|
||||
ks4.sen_pct, ks4.sen_support_pct, ks4.sen_ehcp_pct
|
||||
from all_ks4 ks4
|
||||
inner join {{ ref('int_school_lineage') }} lin
|
||||
on ks4.urn = lin.predecessor_urn
|
||||
where not exists (
|
||||
select 1 from {{ ref('stg_ees_ks4') }} curr
|
||||
select 1 from all_ks4 curr
|
||||
where curr.urn = lin.current_urn
|
||||
and curr.year = ks4.year
|
||||
)
|
||||
|
||||
@@ -39,6 +39,9 @@ sources:
|
||||
- name: ees_ks4_info
|
||||
description: KS4 school information (wide format — context/demographics per school)
|
||||
|
||||
- name: legacy_ks4
|
||||
description: Pre-EES KS4 school-level data (2015/16–2018/19) from DfE Compare School Performance ZIPs
|
||||
|
||||
- name: ees_census
|
||||
description: School census pupil characteristics
|
||||
|
||||
|
||||
@@ -0,0 +1,49 @@
|
||||
{{ config(materialized='table') }}
|
||||
|
||||
-- Staging model: Legacy KS4 data from pre-EES DfE performance tables
|
||||
-- Covers 2015/16 – 2018/19; EES provides 2023/24 onwards.
|
||||
-- The tap already maps old column names and strips % suffixes;
|
||||
-- this model just applies safe_numeric casts and adds NULL placeholders
|
||||
-- for columns not available in the legacy format.
|
||||
|
||||
select
|
||||
cast(trim(urn) as integer) as urn,
|
||||
cast(trim(year) as integer) as year,
|
||||
|
||||
{{ safe_numeric('total_pupils') }}::integer as total_pupils,
|
||||
{{ safe_numeric('total_pupils') }}::integer as eligible_pupils,
|
||||
null::numeric as prior_attainment_avg,
|
||||
|
||||
-- Attainment 8
|
||||
{{ safe_numeric('attainment_8_score') }} as attainment_8_score,
|
||||
|
||||
-- Progress 8
|
||||
{{ safe_numeric('progress_8_score') }} as progress_8_score,
|
||||
{{ safe_numeric('progress_8_lower_ci') }} as progress_8_lower_ci,
|
||||
{{ safe_numeric('progress_8_upper_ci') }} as progress_8_upper_ci,
|
||||
{{ safe_numeric('progress_8_english') }} as progress_8_english,
|
||||
{{ safe_numeric('progress_8_maths') }} as progress_8_maths,
|
||||
{{ safe_numeric('progress_8_ebacc') }} as progress_8_ebacc,
|
||||
{{ safe_numeric('progress_8_open') }} as progress_8_open,
|
||||
|
||||
-- English & Maths pass rates
|
||||
{{ safe_numeric('english_maths_strong_pass_pct') }} as english_maths_strong_pass_pct,
|
||||
{{ safe_numeric('english_maths_standard_pass_pct') }} as english_maths_standard_pass_pct,
|
||||
|
||||
-- EBacc
|
||||
{{ safe_numeric('ebacc_entry_pct') }} as ebacc_entry_pct,
|
||||
{{ safe_numeric('ebacc_strong_pass_pct') }} as ebacc_strong_pass_pct,
|
||||
{{ safe_numeric('ebacc_standard_pass_pct') }} as ebacc_standard_pass_pct,
|
||||
null::numeric as ebacc_avg_score,
|
||||
|
||||
-- GCSE grade 9-1 (not published in legacy format)
|
||||
null::numeric as gcse_grade_91_pct,
|
||||
|
||||
-- SEN
|
||||
null::numeric as sen_pct,
|
||||
{{ safe_numeric('sen_support_pct') }} as sen_support_pct,
|
||||
{{ safe_numeric('sen_ehcp_pct') }} as sen_ehcp_pct
|
||||
|
||||
from {{ source('raw', 'legacy_ks4') }}
|
||||
where urn is not null
|
||||
and urn ~ '^[0-9]+$'
|
||||
Reference in New Issue
Block a user