Commit Graph

10 Commits

Author SHA1 Message Date
b8ecc5c58b fix(ees-tap): strip UTF-8 BOM from CSV column names
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m42s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Some DfE supporting-files CSVs have a UTF-8 BOM on the first column,
causing it to be named '\ufefftime_period' instead of 'time_period'.
This trips Singer schema validation ('time_period' is a required property).
Strip the BOM from all column names after read_csv.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:54:15 +00:00
f4f0257447 fix(ees-tap): add latin-1 encoding for census/admissions, default utf-8 for others
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 52s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
DfE supporting-files CSVs (spc_school_level_underlying_data, AppsandOffers
SchoolLevel) are Latin-1 encoded. Add _encoding class attribute to base
stream class and override to 'latin-1' for census and admissions streams.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:41:40 +00:00
ca351e9d73 feat: migrate backend to marts schema, update EES tap for verified datasets
Pipeline:
- EES tap: split KS4 into performance + info streams, fix admissions filename
  (SchoolLevel keyword match), fix census filename (yearly suffix), remove
  phonics (no school-level data on EES), change endswith → in for matching
- stg_ees_ks4: rewrite to filter long-format data and extract Attainment 8,
  Progress 8, EBacc, English/Maths metrics; join KS4 info for context
- stg_ees_admissions: map real CSV columns (total_number_places_offered, etc.)
- stg_ees_census: update source reference, stub with TODO for data columns
- Remove stg_ees_phonics, fact_phonics (no school-level EES data)
- Add ees_ks4_performance + ees_ks4_info sources, remove ees_ks4 + ees_phonics
- Update int_ks4_with_lineage + fact_ks4_performance with new KS4 columns
- Annual EES DAG: remove stg_ees_phonics+ from selector

Backend:
- models.py: replace all models to point at marts.* tables with schema='marts'
  (DimSchool, DimLocation, KS2Performance, FactOfstedInspection, etc.)
- data_loader.py: rewrite load_school_data_as_dataframe() using raw SQL joining
  dim_school + dim_location + fact_ks2_performance; update get_supplementary_data()
- database.py: remove migration machinery, keep only connection setup
- app.py: remove check_and_migrate_if_needed, remove /api/admin/reimport-ks2
  endpoints (pipeline handles all imports)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:29:27 +00:00
d82e36e7b2 feat(ees): rewrite EES tap and KS2 models for actual data structure
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 31s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Fix publication slugs (KS4, Phonics, Admissions were wrong)
- Split KS2 into two streams: ees_ks2_attainment (long format) and
  ees_ks2_info (wide format context data)
- Target specific filenames instead of keyword matching
- Handle school_urn vs urn column naming
- Pivot KS2 attainment from long to wide format in dbt staging
- Add all ~40 KS2 columns the backend needs (GPS, absence, gender,
  disadvantaged breakdowns, context demographics)
- Pass through all columns in int_ks2_with_lineage and fact_ks2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 23:08:50 +00:00
e7b1ab9f37 fix(pipeline): expand GIAS schema, handle empty strings, scope DAG selectors
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 34s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Declare all 34 columns needed by dbt in GIAS tap schema (target-postgres
  only persists columns present in the Singer schema message)
- Use nullif() for empty-string-to-integer/date casts in staging models
- Scope daily DAG dbt build to GIAS models only (stg_gias_establishments+
  stg_gias_links+) to avoid errors on unloaded sources
- Scope annual EES DAG similarly; remove redundant dbt test steps
- Make dim_school gracefully handle missing int_ofsted_latest table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 20:43:24 +00:00
0062a5eabe fix(tap-gias): declare numeric CSV columns as StringType
Some checks failed
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Failing after 30s
Build and Push Docker Images / Build Kestra Init (push) Failing after 30s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Failing after 29s
Build and Push Docker Images / Trigger Portainer Update (push) Has been skipped
CSV is read with dtype=str so all values arrive as strings. Declaring
LA (code) and EstablishmentNumber as IntegerType caused schema
validation failures in target-postgres. Use StringType for all columns
except URN (which is explicitly cast to int for the primary key).
Type casting happens in dbt staging models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 14:03:26 +00:00
cd75fc4c24 fix(taps): align with integrator resilience patterns
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m7s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Port critical patterns from the working integrator into Singer taps:
- GIAS: add 404 fallback to yesterday's date, increase timeout to 300s,
  use latin-1 encoding, use dated URL for links (static URL returns 500)
- FBIT: add GIAS date fallback, increase timeout, fix encoding to latin-1
- IDACI: use dated GIAS URL with fallback instead of undated static URL,
  fix encoding to latin-1, increase timeout to 300s
- Ofsted: try utf-8-sig then fall back to latin-1 encoding

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 11:13:38 +00:00
97d975114a feat(pipeline): implement parent-view, fbit, idaci Singer taps + align staging/mart models
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m6s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Port extraction logic from integrator scripts into Singer SDK taps:
- tap-uk-parent-view: scrapes Ofsted open data portal, parses survey responses (14 questions)
- tap-uk-fbit: queries FBIT API per-URN with rate limiting, computes per-pupil spend
- tap-uk-idaci: downloads IoD2019 XLSX, batch-resolves postcodes→LSOAs via postcodes.io

Update dbt models to match actual tap output schemas:
- stg_idaci now includes URN (tap does the postcode→LSOA→school join)
- stg_parent_view expanded from 8 to 13 question columns
- fact_deprivation simplified (no longer needs postcode→LSOA join in dbt)
- fact_parent_view expanded to include all 13 question metrics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 10:38:07 +00:00
deb4024731 chore(pipeline): bump all dependencies to latest stable versions
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Airflow 2.11 → 3.1 (BashOperator moved to providers-standard)
- Meltano 3.5 → 4.1 (meltano.yml version 2, meltanolabs target-postgres)
- dbt-postgres 1.9 → 1.10
- singer-sdk 0.39 → 0.53 (all 6 taps)
- Typesense Docker 27.1 → 30.1
- Typesense Python client >=2.0
- Python base image 3.12 → 3.13

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 09:18:11 +00:00
8f02b5125e feat(pipeline): add Meltano + dbt + Airflow ELT pipeline scaffold
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Replaces the hand-rolled integrator with a production-grade ELT pipeline
using Meltano (Singer taps), dbt Core (medallion architecture), and
Apache Airflow (orchestration). Adds Typesense for search and PostGIS
for geospatial queries.

- 6 custom Singer taps (GIAS, EES, Ofsted, Parent View, FBIT, IDACI)
- dbt project: 12 staging, 5 intermediate, 12 mart models
- 3 Airflow DAGs (daily/monthly/annual schedules)
- Typesense sync + batch geocoding scripts
- docker-compose: add Airflow, Typesense; upgrade to PostGIS
- Portainer stack definition matching live deployment topology

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 08:37:53 +00:00