school_compare

Author	SHA1	Message	Date
tudor	668e234eb2	feat(census): add demographic columns to EES census tap and staging models Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s Details Build and Push Docker Images / Build Integrator (push) Successful in 55s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details tap-uk-ees: EESCensusStream now declares 27 data columns (FSM %, EAL %, ethnicity breakdowns, pupil counts) with clean Singer field names mapped from the verbose CSV column names (e.g. '% of pupils known to be eligible for free school meals' → fsm_pct) via a new _column_renames mechanism on the base stream class. stg_ees_census: materialised as table, applies safe_numeric to all percentage/count columns, filters to numeric URNs. int_pupil_chars_merged + fact_pupil_characteristics: pass all columns through from staging (previously stubs with only 3 columns). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 14:07:48 +00:00
tudor	5d8b319451	fix(dbt): stub rc_* columns as NULL in stg_ofsted_inspections Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s Details Build and Push Docker Images / Build Integrator (push) Successful in 56s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m23s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details tap-uk-ofsted schema only declares OEIF columns; rc_* (Report Card) columns were never emitted so they don't exist in raw.ofsted_inspections. Replace column references with NULL::text until the actual CSV column names for the post-Nov 2025 Report Card framework are confirmed and added to the tap schema. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 12:50:58 +00:00
tudor	77f75fb6e5	fix(dbt): deduplicate predecessor KS2 rows and downgrade orphan test to warn Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m11s Details Build and Push Docker Images / Build Integrator (push) Successful in 56s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m31s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details - int_ks2_with_lineage: use DISTINCT ON (current_urn, year) in predecessor_ks2 to handle schools with multiple predecessors that both have KS2 data for the same year (e.g. two schools that merged). Keeps the predecessor with most pupils. - dbt_project.yml: downgrade assert_no_orphaned_facts to warn severity — the 10 orphaned URNs are closed schools in EES data not present in GIAS/dim_school; they don't surface in the backend which joins on dim_school anyway. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 12:16:36 +00:00
tudor	b41e6c250e	fix(dbt): filter non-numeric URNs and trim whitespace in EES staging models Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s Details Build and Push Docker Images / Build Integrator (push) Successful in 55s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m30s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details - Filter school_urn/time_period to '^[0-9]+$' to exclude "n/a" and other non-numeric values that caused integer cast failures in fact_admissions - Add trim() to all school_urn/time_period casts to prevent whitespace variants producing duplicate urn+year rows in fact_ks2_performance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 12:00:30 +00:00
tudor	6e720feca4	perf(dbt): collapse stg_ees_ks2 to single-pass pivot Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s Details Build and Push Docker Images / Build Integrator (push) Successful in 56s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m31s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Previous version scanned ees_ks2_attainment (1.2M rows) 5 times via separate CTEs (all_pupils, gender_boys, gender_girls, disadv, not_disadv) plus 5 LEFT JOINs. Rewritten as one GROUP BY with conditional aggregation — single scan, no self-joins. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 11:42:40 +00:00
tudor	ae9fd26eba	perf(dbt): materialize stg_ees_ks2 and stg_ees_ks4 as tables Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s Details Build and Push Docker Images / Build Integrator (push) Successful in 57s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m30s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details KS2 attainment has 1.2M rows in long format. As a view, the pivot was re-executed inline for every downstream model (intermediate → fact), causing fact_ks2_performance CREATE TABLE to run for 18+ minutes. Materializing as tables means the pivot runs once during staging, and downstream models read from a pre-computed ~16k-row result. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 11:20:20 +00:00
tudor	33b395d2bd	fix(dbt): apply safe_numeric macro to fix EES suppression code 'c' errors Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m14s Details Build and Push Docker Images / Build Integrator (push) Successful in 58s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details Replace nullif(col, 'z') casts with safe_numeric macro across KS2, KS4, and admissions staging models. The regex-based macro treats any non-numeric string (z, c, x, q, u, etc.) as NULL without needing an explicit list. Also fix FSM_eligible_percent column quoting in stg_ees_admissions — target- postgres stores mixed-case column names quoted, so unquoted references were being folded to fsm_eligible_percent by PostgreSQL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 10:41:27 +00:00
tudor	ca351e9d73	feat: migrate backend to marts schema, update EES tap for verified datasets Pipeline: - EES tap: split KS4 into performance + info streams, fix admissions filename (SchoolLevel keyword match), fix census filename (yearly suffix), remove phonics (no school-level data on EES), change endswith → in for matching - stg_ees_ks4: rewrite to filter long-format data and extract Attainment 8, Progress 8, EBacc, English/Maths metrics; join KS4 info for context - stg_ees_admissions: map real CSV columns (total_number_places_offered, etc.) - stg_ees_census: update source reference, stub with TODO for data columns - Remove stg_ees_phonics, fact_phonics (no school-level EES data) - Add ees_ks4_performance + ees_ks4_info sources, remove ees_ks4 + ees_phonics - Update int_ks4_with_lineage + fact_ks4_performance with new KS4 columns - Annual EES DAG: remove stg_ees_phonics+ from selector Backend: - models.py: replace all models to point at marts.* tables with schema='marts' (DimSchool, DimLocation, KS2Performance, FactOfstedInspection, etc.) - data_loader.py: rewrite load_school_data_as_dataframe() using raw SQL joining dim_school + dim_location + fact_ks2_performance; update get_supplementary_data() - database.py: remove migration machinery, keep only connection setup - app.py: remove check_and_migrate_if_needed, remove /api/admin/reimport-ks2 endpoints (pipeline handles all imports) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 09:29:27 +00:00
tudor	d82e36e7b2	feat(ees): rewrite EES tap and KS2 models for actual data structure Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 31s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s Details Build and Push Docker Images / Build Integrator (push) Successful in 55s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details - Fix publication slugs (KS4, Phonics, Admissions were wrong) - Split KS2 into two streams: ees_ks2_attainment (long format) and ees_ks2_info (wide format context data) - Target specific filenames instead of keyword matching - Handle school_urn vs urn column naming - Pivot KS2 attainment from long to wide format in dbt staging - Add all ~40 KS2 columns the backend needs (GPS, absence, gender, disadvantaged breakdowns, context demographics) - Pass through all columns in int_ks2_with_lineage and fact_ks2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 23:08:50 +00:00
tudor	719f06e480	fix(pipeline): make total_pupils non-optional for Typesense, add lat/lng to dim_location Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m3s Details Build and Push Docker Images / Build Integrator (push) Successful in 55s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m29s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details - Remove optional flag from total_pupils (Typesense requires default sorting field to be non-optional) - Add latitude/longitude columns to dim_location computed from PostGIS geom, for direct use by backend and Typesense sync Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 22:45:02 +00:00
tudor	03256fed41	fix(dbt): add search_path to profile so PostGIS functions resolve in all schemas Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s Details Build and Push Docker Images / Build Integrator (push) Has been cancelled Details Build and Push Docker Images / Build Kestra Init (push) Has been cancelled Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled Details Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled Details Build and Push Docker Images / Build Frontend (Next.js) (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 21:45:53 +00:00
tudor	b7cc01f26f	fix(dbt): schema-qualify PostGIS functions in dim_location Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s Details Build and Push Docker Images / Build Integrator (push) Has been cancelled Details Build and Push Docker Images / Build Kestra Init (push) Has been cancelled Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled Details Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled Details Build and Push Docker Images / Build Frontend (Next.js) (push) Has been cancelled Details PostGIS extension lives in public schema; marts schema can't resolve unqualified ST_MakePoint/ST_Transform calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 21:45:03 +00:00
tudor	28ba2fd0a6	fix(dbt): cast easting/northing to double precision for ST_MakePoint Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s Details Build and Push Docker Images / Build Integrator (push) Successful in 56s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m28s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 21:29:16 +00:00
tudor	54df58746e	feat(pipeline): use GIAS easting/northing for all geocoding, drop postcode step Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s Details Build and Push Docker Images / Build Integrator (push) Successful in 55s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details GIAS grid references are the actual school location — far more accurate than postcode centroids. Remove geocode_postcodes.py from the daily DAG and the postcode-not-null filter from dim_location. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 21:18:59 +00:00
tudor	d3e655abdb	fix(dbt): compute geom from easting/northing in dim_location Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m2s Details Build and Push Docker Images / Build Kestra Init (push) Has been cancelled Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled Details Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled Details Build and Push Docker Images / Build Integrator (push) Has been cancelled Details Convert GIAS British National Grid coordinates (EPSG:27700) to WGS84 (EPSG:4326) directly in the dbt model. The geocode script backfills schools missing easting/northing via Postcodes.io. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 21:17:08 +00:00
tudor	d25e333826	fix(dbt): remove invalid relationship test on map_school_lineage Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s Details Build and Push Docker Images / Build Integrator (push) Successful in 55s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Lineage map includes predecessor URNs for closed schools, which are correctly excluded from dim_school (status = 'Open'). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 20:59:29 +00:00
tudor	7f82088d53	fix(pipeline): use to_date for DD-MM-YYYY GIAS dates, exclude EES models from daily DAG Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s Details Build and Push Docker Images / Build Integrator (push) Successful in 56s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m30s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details GIAS CSV dates are DD-MM-YYYY format — use to_date() instead of cast(). Exclude int_ks2_with_lineage+ and int_ks4_with_lineage+ from daily DAG selector since they depend on EES data not yet loaded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 20:51:40 +00:00
tudor	e7b1ab9f37	fix(pipeline): expand GIAS schema, handle empty strings, scope DAG selectors Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s Details Build and Push Docker Images / Build Integrator (push) Successful in 57s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 34s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details - Declare all 34 columns needed by dbt in GIAS tap schema (target-postgres only persists columns present in the Singer schema message) - Use nullif() for empty-string-to-integer/date casts in staging models - Scope daily DAG dbt build to GIAS models only (stg_gias_establishments+ stg_gias_links+) to avoid errors on unloaded sources - Scope annual EES DAG similarly; remove redundant dbt test steps - Make dim_school gracefully handle missing int_ofsted_latest table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 20:43:24 +00:00
tudor	24cfb83144	fix(dbt): fix GIAS source column quoting and remove tests on unloaded sources Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 2m39s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s Details Build and Push Docker Images / Build Integrator (push) Successful in 56s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m27s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details GIAS tap emits uppercase URN column — add quote: true so dbt source tests reference "URN" instead of urn. Remove source-level tests from tables not yet loaded (ofsted, ees, parent_view, fbit, idaci) to prevent relation-not-found errors during dbt build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 20:25:56 +00:00
tudor	97d975114a	feat(pipeline): implement parent-view, fbit, idaci Singer taps + align staging/mart models Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s Details Build and Push Docker Images / Build Integrator (push) Successful in 57s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m6s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Port extraction logic from integrator scripts into Singer SDK taps: - tap-uk-parent-view: scrapes Ofsted open data portal, parses survey responses (14 questions) - tap-uk-fbit: queries FBIT API per-URN with rate limiting, computes per-pupil spend - tap-uk-idaci: downloads IoD2019 XLSX, batch-resolves postcodes→LSOAs via postcodes.io Update dbt models to match actual tap output schemas: - stg_idaci now includes URN (tap does the postcode→LSOA→school join) - stg_parent_view expanded from 8 to 13 question columns - fact_deprivation simplified (no longer needs postcode→LSOA join in dbt) - fact_parent_view expanded to include all 13 question metrics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 10:38:07 +00:00
tudor	8f02b5125e	feat(pipeline): add Meltano + dbt + Airflow ELT pipeline scaffold Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s Details Build and Push Docker Images / Build Integrator (push) Successful in 56s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Replaces the hand-rolled integrator with a production-grade ELT pipeline using Meltano (Singer taps), dbt Core (medallion architecture), and Apache Airflow (orchestration). Adds Typesense for search and PostGIS for geospatial queries. - 6 custom Singer taps (GIAS, EES, Ofsted, Parent View, FBIT, IDACI) - dbt project: 12 staging, 5 intermediate, 12 mart models - 3 Airflow DAGs (daily/monthly/annual schedules) - Typesense sync + batch geocoding scripts - docker-compose: add Airflow, Typesense; upgrade to PostGIS - Portainer stack definition matching live deployment topology Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 08:37:53 +00:00

21 Commits