Commit Graph

49 Commits

Author SHA1 Message Date
250d1f7c77 fix(tap-uk-idaci): add openpyxl dependency for Excel file parsing
Some checks failed
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 49s
Build and Push Docker Images / Build Frontend (Next.js) (push) Failing after 1m2s
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 15:00:00 +00:00
1629a8f994 feat(pipeline): add DAGs for Parent View and IDACI deprivation
Some checks failed
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
- school_data_monthly_parent_view: runs 1st of month, extracts Ofsted
  Parent View and builds fact_parent_view
- school_data_annual_idaci: manual trigger, extracts IDACI deprivation
  index and builds fact_deprivation

Both tables were missing, causing safe_query to fail and leave the
PostgreSQL transaction in an aborted state, silently killing all
subsequent supplementary data queries including fact_admissions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:08:12 +00:00
7724fe3503 fix(stg_ofsted_inspections): correctly filter NULL string inspection dates
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The string 'NULL' is not SQL NULL, so the WHERE in the renamed CTE
passed those rows through. Filter on the raw value using nullif in the
CTE and on the computed date in the outer SELECT.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:21:30 +00:00
1d56eebe87 fix(stg_ofsted_inspections): filter out rows with no inspection date
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m24s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Schools in the MI file that have never been inspected have a null
inspection_date after parsing. Exclude them — they are not inspection
records.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 17:55:11 +00:00
10720400fd fix(stg_ofsted_inspections): parse DD/MM/YYYY date format from Ofsted CSV
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m3s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m28s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 17:34:34 +00:00
05cb22f1a5 fix(stg_ofsted_inspections): handle NULL strings from Ofsted CSV
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m26s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Use nullif+trim for date cast and safe_numeric for integer grades to
handle literal 'NULL' strings present in the new Report Card format CSV.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 17:23:46 +00:00
26aa3c2d70 fix(tap-uk-ofsted): fix header row detection matching 'urn' inside 'turn'
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The preamble row in Ofsted CSVs contains 'turn off all filters' which
matched 'urn' in line.lower(), so header_idx was set to 0 instead of
the real header row. Use a regex that matches URN only as a CSV field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 17:05:03 +00:00
e56a63c59c debug(tap-uk-ofsted): log CSV column names to diagnose 0-record extraction
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 31s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 15:47:32 +00:00
668e234eb2 feat(census): add demographic columns to EES census tap and staging models
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
tap-uk-ees: EESCensusStream now declares 27 data columns (FSM %, EAL %,
ethnicity breakdowns, pupil counts) with clean Singer field names mapped
from the verbose CSV column names (e.g. '% of pupils known to be eligible
for free school meals' → fsm_pct) via a new _column_renames mechanism on
the base stream class.

stg_ees_census: materialised as table, applies safe_numeric to all
percentage/count columns, filters to numeric URNs.

int_pupil_chars_merged + fact_pupil_characteristics: pass all columns
through from staging (previously stubs with only 3 columns).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 14:07:48 +00:00
4b02ab3d8a feat: wire Typesense search into backend, fix sync performance data bug
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 1m1s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
sync_typesense.py:
- Fix query string replacement: was matching 'ST_X(l.geom) as lng' but
  QUERY_BASE uses 'l.longitude as lng' — KS2/KS4 lateral joins were
  silently dropped on every sync run

backend:
- Add typesense_url/typesense_api_key settings to config.py
- Add search_schools_typesense() to data_loader.py — queries Typesense
  'schools' alias, returns URNs in relevance order with typo tolerance;
  falls back to empty list if Typesense is unavailable
- /api/schools: replace pandas str.contains with Typesense search;
  results are filtered from the DataFrame and returned in relevance order;
  graceful fallback to substring match if Typesense is down

requirements.txt: add typesense==0.21.0, numpy==1.26.4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 13:23:32 +00:00
5d8b319451 fix(dbt): stub rc_* columns as NULL in stg_ofsted_inspections
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m23s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
tap-uk-ofsted schema only declares OEIF columns; rc_* (Report Card)
columns were never emitted so they don't exist in raw.ofsted_inspections.
Replace column references with NULL::text until the actual CSV column
names for the post-Nov 2025 Report Card framework are confirmed and
added to the tap schema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 12:50:58 +00:00
77f75fb6e5 fix(dbt): deduplicate predecessor KS2 rows and downgrade orphan test to warn
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m11s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m31s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- int_ks2_with_lineage: use DISTINCT ON (current_urn, year) in predecessor_ks2
  to handle schools with multiple predecessors that both have KS2 data for the
  same year (e.g. two schools that merged). Keeps the predecessor with most pupils.
- dbt_project.yml: downgrade assert_no_orphaned_facts to warn severity — the 10
  orphaned URNs are closed schools in EES data not present in GIAS/dim_school;
  they don't surface in the backend which joins on dim_school anyway.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 12:16:36 +00:00
b41e6c250e fix(dbt): filter non-numeric URNs and trim whitespace in EES staging models
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m30s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Filter school_urn/time_period to '^[0-9]+$' to exclude "n/a" and other
  non-numeric values that caused integer cast failures in fact_admissions
- Add trim() to all school_urn/time_period casts to prevent whitespace
  variants producing duplicate urn+year rows in fact_ks2_performance

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 12:00:30 +00:00
6e720feca4 perf(dbt): collapse stg_ees_ks2 to single-pass pivot
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m31s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Previous version scanned ees_ks2_attainment (1.2M rows) 5 times via
separate CTEs (all_pupils, gender_boys, gender_girls, disadv, not_disadv)
plus 5 LEFT JOINs. Rewritten as one GROUP BY with conditional aggregation
— single scan, no self-joins.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 11:42:40 +00:00
ae9fd26eba perf(dbt): materialize stg_ees_ks2 and stg_ees_ks4 as tables
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m30s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
KS2 attainment has 1.2M rows in long format. As a view, the pivot was
re-executed inline for every downstream model (intermediate → fact),
causing fact_ks2_performance CREATE TABLE to run for 18+ minutes.

Materializing as tables means the pivot runs once during staging, and
downstream models read from a pre-computed ~16k-row result.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 11:20:20 +00:00
33b395d2bd fix(dbt): apply safe_numeric macro to fix EES suppression code 'c' errors
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m14s
Build and Push Docker Images / Build Integrator (push) Successful in 58s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Replace nullif(col, 'z') casts with safe_numeric macro across KS2, KS4,
and admissions staging models. The regex-based macro treats any non-numeric
string (z, c, x, q, u, etc.) as NULL without needing an explicit list.

Also fix FSM_eligible_percent column quoting in stg_ees_admissions — target-
postgres stores mixed-case column names quoted, so unquoted references were
being folded to fsm_eligible_percent by PostgreSQL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 10:41:27 +00:00
8e8d1bd8c5 fix(ees-tap): filter out rows with null URN before emitting
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m47s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The admissions school-level file contains some rows with null school_urn
(LA/category aggregates that survive the geographic_level filter). These
cause a not-null constraint violation at target-postgres. Drop any row
where the URN column is null or empty before yielding records.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 10:13:17 +00:00
c7357336e3 fix(ees-tap): fix BOM handling for admissions CSV
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m6s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Admissions file is UTF-8 with BOM, not Latin-1. Reading as latin-1
decoded the BOM bytes as '' which wasn't stripped. Change admissions
encoding to utf-8-sig (strips BOM automatically). Also update the manual
BOM strip fallback to handle the latin-1 decoded form.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 10:03:17 +00:00
b8ecc5c58b fix(ees-tap): strip UTF-8 BOM from CSV column names
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m42s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Some DfE supporting-files CSVs have a UTF-8 BOM on the first column,
causing it to be named '\ufefftime_period' instead of 'time_period'.
This trips Singer schema validation ('time_period' is a required property).
Strip the BOM from all column names after read_csv.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:54:15 +00:00
f4f0257447 fix(ees-tap): add latin-1 encoding for census/admissions, default utf-8 for others
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 52s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
DfE supporting-files CSVs (spc_school_level_underlying_data, AppsandOffers
SchoolLevel) are Latin-1 encoded. Add _encoding class attribute to base
stream class and override to 'latin-1' for census and admissions streams.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:41:40 +00:00
ca351e9d73 feat: migrate backend to marts schema, update EES tap for verified datasets
Pipeline:
- EES tap: split KS4 into performance + info streams, fix admissions filename
  (SchoolLevel keyword match), fix census filename (yearly suffix), remove
  phonics (no school-level data on EES), change endswith → in for matching
- stg_ees_ks4: rewrite to filter long-format data and extract Attainment 8,
  Progress 8, EBacc, English/Maths metrics; join KS4 info for context
- stg_ees_admissions: map real CSV columns (total_number_places_offered, etc.)
- stg_ees_census: update source reference, stub with TODO for data columns
- Remove stg_ees_phonics, fact_phonics (no school-level EES data)
- Add ees_ks4_performance + ees_ks4_info sources, remove ees_ks4 + ees_phonics
- Update int_ks4_with_lineage + fact_ks4_performance with new KS4 columns
- Annual EES DAG: remove stg_ees_phonics+ from selector

Backend:
- models.py: replace all models to point at marts.* tables with schema='marts'
  (DimSchool, DimLocation, KS2Performance, FactOfstedInspection, etc.)
- data_loader.py: rewrite load_school_data_as_dataframe() using raw SQL joining
  dim_school + dim_location + fact_ks2_performance; update get_supplementary_data()
- database.py: remove migration machinery, keep only connection setup
- app.py: remove check_and_migrate_if_needed, remove /api/admin/reimport-ks2
  endpoints (pipeline handles all imports)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:29:27 +00:00
d82e36e7b2 feat(ees): rewrite EES tap and KS2 models for actual data structure
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 31s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Fix publication slugs (KS4, Phonics, Admissions were wrong)
- Split KS2 into two streams: ees_ks2_attainment (long format) and
  ees_ks2_info (wide format context data)
- Target specific filenames instead of keyword matching
- Handle school_urn vs urn column naming
- Pivot KS2 attainment from long to wide format in dbt staging
- Add all ~40 KS2 columns the backend needs (GPS, absence, gender,
  disadvantaged breakdowns, context demographics)
- Pass through all columns in int_ks2_with_lineage and fact_ks2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 23:08:50 +00:00
719f06e480 fix(pipeline): make total_pupils non-optional for Typesense, add lat/lng to dim_location
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m3s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m29s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Remove optional flag from total_pupils (Typesense requires default
  sorting field to be non-optional)
- Add latitude/longitude columns to dim_location computed from PostGIS
  geom, for direct use by backend and Typesense sync

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:45:02 +00:00
5e44d88d23 fix(sync): use numeric default_sorting_field, dynamic KS2/KS4 joins, populate geopoints
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m28s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Typesense requires numeric default_sorting_field — use total_pupils
- Dynamically include KS2/KS4 joins only if those tables exist
- Extract lat/lng from PostGIS geom and populate Typesense geopoint field

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:16:21 +00:00
72cbbf7778 fix(dbt): simplify search_path to just public for PostGIS
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m30s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:47:01 +00:00
03256fed41 fix(dbt): add search_path to profile so PostGIS functions resolve in all schemas
Some checks failed
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Integrator (push) Has been cancelled
Build and Push Docker Images / Build Kestra Init (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Frontend (Next.js) (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:45:53 +00:00
b7cc01f26f fix(dbt): schema-qualify PostGIS functions in dim_location
Some checks failed
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Integrator (push) Has been cancelled
Build and Push Docker Images / Build Kestra Init (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Frontend (Next.js) (push) Has been cancelled
PostGIS extension lives in public schema; marts schema can't resolve
unqualified ST_MakePoint/ST_Transform calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:45:03 +00:00
28ba2fd0a6 fix(dbt): cast easting/northing to double precision for ST_MakePoint
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m28s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:29:16 +00:00
54df58746e feat(pipeline): use GIAS easting/northing for all geocoding, drop postcode step
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
GIAS grid references are the actual school location — far more accurate
than postcode centroids. Remove geocode_postcodes.py from the daily DAG
and the postcode-not-null filter from dim_location.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:18:59 +00:00
d3e655abdb fix(dbt): compute geom from easting/northing in dim_location
Some checks failed
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m2s
Build and Push Docker Images / Build Kestra Init (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Integrator (push) Has been cancelled
Convert GIAS British National Grid coordinates (EPSG:27700) to WGS84
(EPSG:4326) directly in the dbt model. The geocode script backfills
schools missing easting/northing via Postcodes.io.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:17:08 +00:00
45f3e4d9fc fix(dbt): override generate_schema_name to use direct schema names
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m28s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
dbt default prepends the profile schema as prefix (public_staging,
public_marts). Override to use custom schema names directly (staging,
marts) so scripts can reference marts.dim_location correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:09:23 +00:00
d25e333826 fix(dbt): remove invalid relationship test on map_school_lineage
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Lineage map includes predecessor URNs for closed schools, which are
correctly excluded from dim_school (status = 'Open').

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 20:59:29 +00:00
7f82088d53 fix(pipeline): use to_date for DD-MM-YYYY GIAS dates, exclude EES models from daily DAG
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m30s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
GIAS CSV dates are DD-MM-YYYY format — use to_date() instead of cast().
Exclude int_ks2_with_lineage+ and int_ks4_with_lineage+ from daily DAG
selector since they depend on EES data not yet loaded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 20:51:40 +00:00
e7b1ab9f37 fix(pipeline): expand GIAS schema, handle empty strings, scope DAG selectors
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 34s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Declare all 34 columns needed by dbt in GIAS tap schema (target-postgres
  only persists columns present in the Singer schema message)
- Use nullif() for empty-string-to-integer/date casts in staging models
- Scope daily DAG dbt build to GIAS models only (stg_gias_establishments+
  stg_gias_links+) to avoid errors on unloaded sources
- Scope annual EES DAG similarly; remove redundant dbt test steps
- Make dim_school gracefully handle missing int_ofsted_latest table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 20:43:24 +00:00
24cfb83144 fix(dbt): fix GIAS source column quoting and remove tests on unloaded sources
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 2m39s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m27s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
GIAS tap emits uppercase URN column — add quote: true so dbt source tests
reference "URN" instead of urn. Remove source-level tests from tables not yet
loaded (ofsted, ees, parent_view, fbit, idaci) to prevent relation-not-found
errors during dbt build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 20:25:56 +00:00
0062a5eabe fix(tap-gias): declare numeric CSV columns as StringType
Some checks failed
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Failing after 30s
Build and Push Docker Images / Build Kestra Init (push) Failing after 30s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Failing after 29s
Build and Push Docker Images / Trigger Portainer Update (push) Has been skipped
CSV is read with dtype=str so all values arrive as strings. Declaring
LA (code) and EstablishmentNumber as IntegerType caused schema
validation failures in target-postgres. Use StringType for all columns
except URN (which is explicitly cast to int for the primary key).
Type casting happens in dbt staging models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 14:03:26 +00:00
84261f6125 fix(meltano): set default_environment, remove deprecated version field
Some checks failed
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Kestra Init (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Integrator (push) Has been cancelled
Meltano 4.x requires an environment to be specified. Set production as
the default. Also remove the deprecated 'version: 2' field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 14:01:31 +00:00
9eae6bffae fix(meltano): use 'database' not 'dbname' for meltanolabs target-postgres
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m6s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m35s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
The meltanolabs target-postgres variant expects 'database' as the
config key, not 'dbname' (which was the pipelinewise variant's key).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 13:53:49 +00:00
c576bba06a fix(meltano): remove catalog capability and switch elt to run
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m26s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
The `catalog` capability forced Meltano to run --discover and generate
a catalog file (tap.properties.json) before each extraction. This fails
because our Singer SDK taps emit schemas inline and don't need external
catalog files. Removing the capability makes Meltano invoke taps
directly without catalog generation.

Also switch from deprecated `meltano elt` to `meltano run` for
Meltano 4.x compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 13:45:23 +00:00
1c77a6c593 fix(pipeline): run meltano install in Dockerfile to generate catalogs
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m13s
Build and Push Docker Images / Build Integrator (push) Successful in 58s
Build and Push Docker Images / Build Kestra Init (push) Successful in 33s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m31s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Meltano elt requires catalog files (tap.properties.json) to exist.
These are generated by `meltano install` which discovers tap schemas
and installs the target-postgres loader. Without this step, `meltano
elt` fails with "catalog file is missing".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 12:28:59 +00:00
cd75fc4c24 fix(taps): align with integrator resilience patterns
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m7s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Port critical patterns from the working integrator into Singer taps:
- GIAS: add 404 fallback to yesterday's date, increase timeout to 300s,
  use latin-1 encoding, use dated URL for links (static URL returns 500)
- FBIT: add GIAS date fallback, increase timeout, fix encoding to latin-1
- IDACI: use dated GIAS URL with fallback instead of undated static URL,
  fix encoding to latin-1, increase timeout to 300s
- Ofsted: try utf-8-sig then fall back to latin-1 encoding

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 11:13:38 +00:00
b6a487776b fix(airflow): set DAGS_FOLDER in image env and reserialize on init
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Add AIRFLOW__CORE__DAGS_FOLDER env var in Dockerfile so it's always set
- Run `airflow dags reserialize` after `db migrate` in init container so
  DAGs appear immediately without waiting for scheduler scan interval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 11:05:41 +00:00
e815f597ab fix(dags): use global bin paths and add BashOperator import fallback
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 49s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- MELTANO_BIN/DBT_BIN pointed to .venv/bin/ but Dockerfile installs globally
- Add try/except for BashOperator import to handle both Airflow 3 provider
  path and legacy path, preventing silent DAG import failures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 10:47:18 +00:00
97d975114a feat(pipeline): implement parent-view, fbit, idaci Singer taps + align staging/mart models
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m6s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Port extraction logic from integrator scripts into Singer SDK taps:
- tap-uk-parent-view: scrapes Ofsted open data portal, parses survey responses (14 questions)
- tap-uk-fbit: queries FBIT API per-URN with rate limiting, computes per-pupil spend
- tap-uk-idaci: downloads IoD2019 XLSX, batch-resolves postcodes→LSOAs via postcodes.io

Update dbt models to match actual tap output schemas:
- stg_idaci now includes URN (tap does the postcode→LSOA→school join)
- stg_parent_view expanded from 8 to 13 question columns
- fact_deprivation simplified (no longer needs postcode→LSOA join in dbt)
- fact_parent_view expanded to include all 13 question metrics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 10:38:07 +00:00
914de17d15 fix(pipeline): install curl in pipeline image for healthchecks
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m46s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 09:52:06 +00:00
a7904b627d fix(pipeline): migrate to Airflow 3 API server and SimpleAuthManager
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Integrator (push) Successful in 58s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 31s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Airflow 3 replaced `airflow webserver` with `airflow api-server` and
removed the `airflow users` CLI. Auth is now via SimpleAuthManager
configured through AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_USERS env var.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 09:32:08 +00:00
deb4024731 chore(pipeline): bump all dependencies to latest stable versions
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Airflow 2.11 → 3.1 (BashOperator moved to providers-standard)
- Meltano 3.5 → 4.1 (meltano.yml version 2, meltanolabs target-postgres)
- dbt-postgres 1.9 → 1.10
- singer-sdk 0.39 → 0.53 (all 6 taps)
- Typesense Docker 27.1 → 30.1
- Typesense Python client >=2.0
- Python base image 3.12 → 3.13

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 09:18:11 +00:00
e32666ae4c fix(pipeline): bump Airflow to 2.11 and dbt to 1.9 to resolve SQLAlchemy conflict
Some checks failed
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Failing after 49s
Build and Push Docker Images / Trigger Portainer Update (push) Has been skipped
Airflow 2.10 requires SQLAlchemy <2.0, but dbt-postgres 1.8+ pulls in
SQLAlchemy 2.x. Airflow 2.11 supports SQLAlchemy 2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 09:08:21 +00:00
8f02b5125e feat(pipeline): add Meltano + dbt + Airflow ELT pipeline scaffold
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Replaces the hand-rolled integrator with a production-grade ELT pipeline
using Meltano (Singer taps), dbt Core (medallion architecture), and
Apache Airflow (orchestration). Adds Typesense for search and PostGIS
for geospatial queries.

- 6 custom Singer taps (GIAS, EES, Ofsted, Parent View, FBIT, IDACI)
- dbt project: 12 staging, 5 intermediate, 12 mart models
- 3 Airflow DAGs (daily/monthly/annual schedules)
- Typesense sync + batch geocoding scripts
- docker-compose: add Airflow, Typesense; upgrade to PostGIS
- Portainer stack definition matching live deployment topology

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 08:37:53 +00:00