school_compare

Author	SHA1	Message	Date
Tudor Sitaru	2b757e556d	fix(legacy-ks2): strip % suffix from percentage values Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m11s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m37s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Old DfE CSVs encode percentages as "57%" not "57". The safe_numeric macro rejects non-numeric strings, so strip the suffix before emitting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-01 13:07:51 +01:00
Tudor Sitaru	fba8e74b72	refactor(legacy-ks2): use explicit year→URL mapping instead of base URL pattern Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 32s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details The file hosting uses non-deterministic URLs, so replace legacy_ks2_base_url + legacy_ks2_years with a single legacy_ks2_urls object mapping year codes to download URLs. Configure the 4 pre-COVID years in meltano.yml. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:44:11 +01:00
Tudor Sitaru	6d4962639c	feat(legacy-ks2): add stream for pre-COVID KS2 data (2015-2019) Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 46s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m17s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 2m26s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details - Add LegacyKS2Stream to tap-uk-ees: downloads old DfE england_ks2final.csv files from a configurable base URL, maps 318-column wide format to the same schema as stg_ees_ks2 output - Add stg_legacy_ks2.sql staging model with safe_numeric casts - Add legacy_ks2 source to _stg_sources.yml - Update int_ks2_with_lineage.sql to union EES + legacy data - Configurable via legacy_ks2_base_url and legacy_ks2_years tap settings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 14:36:41 +01:00
Tudor Sitaru	fc011c6547	fix(tap-uk-ees): case-insensitive URN column matching for older census files Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m48s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details Older census CSVs use 'URN' (uppercase) while the stream expects 'urn'. Normalise the column name before filtering and emitting records. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 22:36:16 +01:00
Tudor Sitaru	752abd69a5	fix(tap-uk-ees): inject time_period from release slug when absent in CSV Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m37s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details Older census (and other) files don't include a time_period column. Derive it from the release slug (e.g. '2022-23' → '202223') and inject it into records so the required Singer schema field is always present. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 21:59:24 +01:00
Tudor Sitaru	570c2b689e	fix(tap-uk-ees): handle plain list response from releases endpoint Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m6s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 21:47:14 +01:00
Tudor Sitaru	9a1572ea20	feat(tap-uk-ees): fetch all historical releases, not just latest Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m42s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details Add get_all_release_ids() to paginate /publications/{slug}/releases and iterate over every release in get_records(). Add latest_only config flag (default false) to restore single-release behaviour for daily runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 21:37:26 +01:00
tudor	250d1f7c77	fix(tap-uk-idaci): add openpyxl dependency for Excel file parsing Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 49s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Failing after 1m2s Details Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-28 15:00:00 +00:00
tudor	26aa3c2d70	fix(tap-uk-ofsted): fix header row detection matching 'urn' inside 'turn' Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details The preamble row in Ofsted CSVs contains 'turn off all filters' which matched 'urn' in line.lower(), so header_idx was set to 0 instead of the real header row. Use a regex that matches URN only as a CSV field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 17:05:03 +00:00
tudor	e56a63c59c	debug(tap-uk-ofsted): log CSV column names to diagnose 0-record extraction Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 31s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 15:47:32 +00:00
tudor	668e234eb2	feat(census): add demographic columns to EES census tap and staging models Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s Details Build and Push Docker Images / Build Integrator (push) Successful in 55s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details tap-uk-ees: EESCensusStream now declares 27 data columns (FSM %, EAL %, ethnicity breakdowns, pupil counts) with clean Singer field names mapped from the verbose CSV column names (e.g. '% of pupils known to be eligible for free school meals' → fsm_pct) via a new _column_renames mechanism on the base stream class. stg_ees_census: materialised as table, applies safe_numeric to all percentage/count columns, filters to numeric URNs. int_pupil_chars_merged + fact_pupil_characteristics: pass all columns through from staging (previously stubs with only 3 columns). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 14:07:48 +00:00
tudor	8e8d1bd8c5	fix(ees-tap): filter out rows with null URN before emitting Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s Details Build and Push Docker Images / Build Integrator (push) Successful in 56s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m47s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details The admissions school-level file contains some rows with null school_urn (LA/category aggregates that survive the geographic_level filter). These cause a not-null constraint violation at target-postgres. Drop any row where the URN column is null or empty before yielding records. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 10:13:17 +00:00
tudor	c7357336e3	fix(ees-tap): fix BOM handling for admissions CSV Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m6s Details Build and Push Docker Images / Build Integrator (push) Successful in 57s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Admissions file is UTF-8 with BOM, not Latin-1. Reading as latin-1 decoded the BOM bytes as 'ï»¿' which wasn't stripped. Change admissions encoding to utf-8-sig (strips BOM automatically). Also update the manual BOM strip fallback to handle the latin-1 decoded form. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 10:03:17 +00:00
tudor	b8ecc5c58b	fix(ees-tap): strip UTF-8 BOM from CSV column names Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s Details Build and Push Docker Images / Build Integrator (push) Successful in 55s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m42s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details Some DfE supporting-files CSVs have a UTF-8 BOM on the first column, causing it to be named '\ufefftime_period' instead of 'time_period'. This trips Singer schema validation ('time_period' is a required property). Strip the BOM from all column names after read_csv. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 09:54:15 +00:00
tudor	f4f0257447	fix(ees-tap): add latin-1 encoding for census/admissions, default utf-8 for others Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 52s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s Details Build and Push Docker Images / Build Integrator (push) Successful in 55s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details DfE supporting-files CSVs (spc_school_level_underlying_data, AppsandOffers SchoolLevel) are Latin-1 encoded. Add _encoding class attribute to base stream class and override to 'latin-1' for census and admissions streams. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 09:41:40 +00:00
tudor	ca351e9d73	feat: migrate backend to marts schema, update EES tap for verified datasets Pipeline: - EES tap: split KS4 into performance + info streams, fix admissions filename (SchoolLevel keyword match), fix census filename (yearly suffix), remove phonics (no school-level data on EES), change endswith → in for matching - stg_ees_ks4: rewrite to filter long-format data and extract Attainment 8, Progress 8, EBacc, English/Maths metrics; join KS4 info for context - stg_ees_admissions: map real CSV columns (total_number_places_offered, etc.) - stg_ees_census: update source reference, stub with TODO for data columns - Remove stg_ees_phonics, fact_phonics (no school-level EES data) - Add ees_ks4_performance + ees_ks4_info sources, remove ees_ks4 + ees_phonics - Update int_ks4_with_lineage + fact_ks4_performance with new KS4 columns - Annual EES DAG: remove stg_ees_phonics+ from selector Backend: - models.py: replace all models to point at marts.* tables with schema='marts' (DimSchool, DimLocation, KS2Performance, FactOfstedInspection, etc.) - data_loader.py: rewrite load_school_data_as_dataframe() using raw SQL joining dim_school + dim_location + fact_ks2_performance; update get_supplementary_data() - database.py: remove migration machinery, keep only connection setup - app.py: remove check_and_migrate_if_needed, remove /api/admin/reimport-ks2 endpoints (pipeline handles all imports) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 09:29:27 +00:00
tudor	d82e36e7b2	feat(ees): rewrite EES tap and KS2 models for actual data structure Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 31s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s Details Build and Push Docker Images / Build Integrator (push) Successful in 55s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details - Fix publication slugs (KS4, Phonics, Admissions were wrong) - Split KS2 into two streams: ees_ks2_attainment (long format) and ees_ks2_info (wide format context data) - Target specific filenames instead of keyword matching - Handle school_urn vs urn column naming - Pivot KS2 attainment from long to wide format in dbt staging - Add all ~40 KS2 columns the backend needs (GPS, absence, gender, disadvantaged breakdowns, context demographics) - Pass through all columns in int_ks2_with_lineage and fact_ks2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 23:08:50 +00:00
tudor	e7b1ab9f37	fix(pipeline): expand GIAS schema, handle empty strings, scope DAG selectors Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s Details Build and Push Docker Images / Build Integrator (push) Successful in 57s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 34s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details - Declare all 34 columns needed by dbt in GIAS tap schema (target-postgres only persists columns present in the Singer schema message) - Use nullif() for empty-string-to-integer/date casts in staging models - Scope daily DAG dbt build to GIAS models only (stg_gias_establishments+ stg_gias_links+) to avoid errors on unloaded sources - Scope annual EES DAG similarly; remove redundant dbt test steps - Make dim_school gracefully handle missing int_ofsted_latest table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 20:43:24 +00:00
tudor	0062a5eabe	fix(tap-gias): declare numeric CSV columns as StringType Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s Details Build and Push Docker Images / Build Integrator (push) Failing after 30s Details Build and Push Docker Images / Build Kestra Init (push) Failing after 30s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Failing after 29s Details Build and Push Docker Images / Trigger Portainer Update (push) Has been skipped Details CSV is read with dtype=str so all values arrive as strings. Declaring LA (code) and EstablishmentNumber as IntegerType caused schema validation failures in target-postgres. Use StringType for all columns except URN (which is explicitly cast to int for the primary key). Type casting happens in dbt staging models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 14:03:26 +00:00
tudor	cd75fc4c24	fix(taps): align with integrator resilience patterns Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s Details Build and Push Docker Images / Build Integrator (push) Successful in 56s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m7s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Port critical patterns from the working integrator into Singer taps: - GIAS: add 404 fallback to yesterday's date, increase timeout to 300s, use latin-1 encoding, use dated URL for links (static URL returns 500) - FBIT: add GIAS date fallback, increase timeout, fix encoding to latin-1 - IDACI: use dated GIAS URL with fallback instead of undated static URL, fix encoding to latin-1, increase timeout to 300s - Ofsted: try utf-8-sig then fall back to latin-1 encoding Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 11:13:38 +00:00
tudor	97d975114a	feat(pipeline): implement parent-view, fbit, idaci Singer taps + align staging/mart models Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s Details Build and Push Docker Images / Build Integrator (push) Successful in 57s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 31s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m6s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Port extraction logic from integrator scripts into Singer SDK taps: - tap-uk-parent-view: scrapes Ofsted open data portal, parses survey responses (14 questions) - tap-uk-fbit: queries FBIT API per-URN with rate limiting, computes per-pupil spend - tap-uk-idaci: downloads IoD2019 XLSX, batch-resolves postcodes→LSOAs via postcodes.io Update dbt models to match actual tap output schemas: - stg_idaci now includes URN (tap does the postcode→LSOA→school join) - stg_parent_view expanded from 8 to 13 question columns - fact_deprivation simplified (no longer needs postcode→LSOA join in dbt) - fact_parent_view expanded to include all 13 question metrics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 10:38:07 +00:00
tudor	deb4024731	chore(pipeline): bump all dependencies to latest stable versions Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s Details Build and Push Docker Images / Build Integrator (push) Successful in 57s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s Details - Airflow 2.11 → 3.1 (BashOperator moved to providers-standard) - Meltano 3.5 → 4.1 (meltano.yml version 2, meltanolabs target-postgres) - dbt-postgres 1.9 → 1.10 - singer-sdk 0.39 → 0.53 (all 6 taps) - Typesense Docker 27.1 → 30.1 - Typesense Python client >=2.0 - Python base image 3.12 → 3.13 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 09:18:11 +00:00
tudor	8f02b5125e	feat(pipeline): add Meltano + dbt + Airflow ELT pipeline scaffold Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s Details Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s Details Build and Push Docker Images / Build Integrator (push) Successful in 56s Details Build and Push Docker Images / Build Kestra Init (push) Successful in 32s Details Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s Details Replaces the hand-rolled integrator with a production-grade ELT pipeline using Meltano (Singer taps), dbt Core (medallion architecture), and Apache Airflow (orchestration). Adds Typesense for search and PostGIS for geospatial queries. - 6 custom Singer taps (GIAS, EES, Ofsted, Parent View, FBIT, IDACI) - dbt project: 12 staging, 5 intermediate, 12 mart models - 3 Airflow DAGs (daily/monthly/annual schedules) - Typesense sync + batch geocoding scripts - docker-compose: add Airflow, Typesense; upgrade to PostGIS - Portainer stack definition matching live deployment topology Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 08:37:53 +00:00

23 Commits