22 Commits

Author SHA1 Message Date
Tudor Sitaru 6d685b7e8a refactor(admissions): rename published_admission_number to places_offered
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 18s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 46s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Failing after 13s
Build and Push Docker Images / Trigger Portainer Update (push) Has been skipped
The staging model aliased EES's total_number_places_offered column as
published_admission_number, but PAN is the school's published capacity
(not exposed by EES at school level) — what we actually have is the
count of places offered in a given admissions round. The misnomer
propagated to the mart, SQLAlchemy model, API response, TS types, and
UI copy ("places per year", "(PAN)").

Rename end-to-end and fix the UI labels:
  - "29 places for 42 first-choice applications"
      → "29 places offered for 42 first-choice applications"
  - "Reception/Year 7 places per year"
      → "Reception/Year 7 places offered"
  - drop the misleading "(PAN)" suffix in the secondary view

Also add a comment in stg_ees_admissions clarifying this is the number
of places offered, not PAN. Requires dbt to rebuild fact_admissions
(marts are materialized as tables) before the backend can start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 09:45:43 +01:00
Tudor Sitaru 8ce34b3ecc fix(list): read ofsted grade from fact_ofsted_inspection directly, fix dim_school schema lookup
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 18s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 49s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m9s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
dim_school.sql was checking for int_ofsted_latest in target.schema (wrong schema)
due to the custom generate_schema_name macro using literal schema names. The
model lives in 'intermediate', so ofsted_grade/date/framework were always NULL
in dim_school, causing all list cards to show 'Not yet inspected'.

Fix 1: data_loader.py joins marts.fact_ofsted_inspection with DISTINCT ON to
get latest inspection per school — no pipeline re-run needed.

Fix 2: dim_school.sql uses schema='intermediate' so future dbt runs correctly
denormalise the Ofsted summary into dim_school.
2026-04-13 14:51:14 +01:00
Tudor Sitaru dc66e22d4d feat: ingest official DfE KS2 national averages from EES data catalogue
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 19s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 53s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m24s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Replaces computed means from our school dataset with the published DfE
national headline figures for the KS2 chart reference line.

- tap-uk-ees: new EESKs2NationalStream fetches the stable EES data-catalogue
  CSV (one row per year, England national total, AllSchools filter)
- dbt staging: stg_ees_ks2_national normalises columns, casts to float,
  filters to years >= 201617
- dbt mart: fact_ks2_national_averages — one row per year, official figures
- backend/models: Ks2NationalAverage SQLAlchemy model
- backend/app: /api/national-averages queries the mart for KS2 by_year;
  secondary by_year stays computed (no DfE KS4 national dataset yet)
- DAG: extract_ks2_national task added to school_data_annual_ees,
  runs in parallel with the main EES extract

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 14:40:33 +01:00
Tudor Sitaru 1e5c66d6ab fix(admissions): correct first_preference_offer_pct in dbt staging
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 18s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 49s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m12s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The staging model was mapping EES column ``proportion_1stprefs_v_totaloffers``
straight onto ``first_preference_offer_pct``. That raw column is not a
percentage — it is a ratio of first-preference applications to total offers
(an oversubscription indicator, >1 means oversubscribed), so OLQH rendered
as "1%" when the true first-choice success rate is 27/42 = 64%.

The frontend display code is not at fault and is not patched here —
data-quality issues must be fixed at the source.

- stg_ees_admissions: compute ``first_preference_offer_pct`` as
  ``100 * number_1st_preference_offers / times_put_as_1st_preference`` —
  of families who listed this school first, the % that received an offer
  (0–100). Guard against divide-by-zero.
- stg_ees_admissions: expose the legitimate EES ratio as the new column
  ``oversubscription_ratio`` (1st-preference applications per place) for
  future use, clearly named.
- fact_admissions, FactAdmissions model, data_loader: propagate the new
  ``oversubscription_ratio`` column.
- SchoolAdmissions type: document both columns inline.
- buildSchoolSummary: reword the oversubscription clause so it reads
  sensibly across the whole 0–100 range (no more "just 64%").
- Hero chip subtitle: clearer phrasing "X% of first-choice applicants
  offered a place".

Requires a dbt run of stg_ees_admissions and fact_admissions on deploy
so the new column materialises.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 11:29:40 +01:00
Tudor Sitaru f053b35c6f test(dim_school): downgrade phase not_null to warn
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 13s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 45s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m16s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The new phase inference can legitimately leave ~1100 independent schools with
null phase (no GIAS phase, no statutory ages, name gives no hint). That's a
known data quality gap, not a pipeline failure — the UI already handles null
by showing no pill. Downgrade the test to warn so it stays visible in dbt
output without blocking the DAG.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 22:12:57 +01:00
Tudor Sitaru ca5f6a962c fix(dim_school): expand phase inference with name-based fallback
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 12s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 45s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m10s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The case-insensitive "Not Applicable" fix caught schools where GIAS publishes
statutory ages, but some independent schools leave those blank too — they fall
through every branch and end up with null phase and no pill in the UI.

Add a third tier that infers phase from the school name
(Primary/Infant/Junior/Prep vs Secondary/High/Grammar/Senior/Upper) and also
normalise "Not Applicable" handling with trim() + "unknown"/"" exclusion, so
the final else branch can safely return null instead of the catch-all string.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 21:15:54 +01:00
Tudor Sitaru 5b025b98bd fix(dim_school): use case-insensitive comparison for phase inference
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 13s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 50s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m6s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
GIAS provides 'Not Applicable' (capital A) but the check used 'Not applicable',
so the case-sensitive != matched true and skipped the age-range inference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 15:33:04 +01:00
Tudor Sitaru 4c3c3c882d fix(dim_school): infer phase from age range for independent schools
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 12s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 50s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m9s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Independent schools have phase='Not applicable' in GIAS. Now infer
phase from statutory age range: <=11 → Primary, >=11 → Secondary,
spans both → All-through. Falls back to original value if no age data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 16:18:52 +01:00
tudor 6e5249aa1e refactor(phase): merge KS2+KS4 into fact_performance, fix all phase inconsistencies
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 50s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m24s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Root cause: the UNION ALL query in data_loader.py produced two rows per
all-through school per year (one KS2, one KS4), with drop_duplicates()
silently discarding the KS4 row. Fixes:

- New dbt mart `fact_performance`: FULL OUTER JOIN of fact_ks2_performance
  and fact_ks4_performance on (urn, year). One row per school per year.
  All-through schools have both KS2 and KS4 columns populated.
- data_loader.py: replace 175-line UNION ALL with a simple JOIN to
  fact_performance. No more duplicate rows or drop_duplicates needed.
- sync_typesense.py: single LATERAL JOIN to fact_performance instead of
  two separate KS2/KS4 joins.
- app.py: remove drop_duplicates (no longer needed); add PHASE_GROUPS
  constant so all-through/middle schools appear in primary and secondary
  filter results (were previously invisible to both); scope result_filters
  gender/admissions_policies to secondary schools only.
- HomeView.tsx: isSecondaryView is now majority-based (not "any secondary")
  and isMixedView shows both sort option sets for mixed result sets.
- school/[slug]/page.tsx: all-through schools route to SchoolDetailView
  (renders both SATs + GCSE sections) instead of SecondarySchoolDetailView
  (KS4-only). Dedicated SEO metadata for all-through schools.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 14:07:30 +01:00
tudor 668e234eb2 feat(census): add demographic columns to EES census tap and staging models
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
tap-uk-ees: EESCensusStream now declares 27 data columns (FSM %, EAL %,
ethnicity breakdowns, pupil counts) with clean Singer field names mapped
from the verbose CSV column names (e.g. '% of pupils known to be eligible
for free school meals' → fsm_pct) via a new _column_renames mechanism on
the base stream class.

stg_ees_census: materialised as table, applies safe_numeric to all
percentage/count columns, filters to numeric URNs.

int_pupil_chars_merged + fact_pupil_characteristics: pass all columns
through from staging (previously stubs with only 3 columns).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 14:07:48 +00:00
tudor ca351e9d73 feat: migrate backend to marts schema, update EES tap for verified datasets
Pipeline:
- EES tap: split KS4 into performance + info streams, fix admissions filename
  (SchoolLevel keyword match), fix census filename (yearly suffix), remove
  phonics (no school-level data on EES), change endswith → in for matching
- stg_ees_ks4: rewrite to filter long-format data and extract Attainment 8,
  Progress 8, EBacc, English/Maths metrics; join KS4 info for context
- stg_ees_admissions: map real CSV columns (total_number_places_offered, etc.)
- stg_ees_census: update source reference, stub with TODO for data columns
- Remove stg_ees_phonics, fact_phonics (no school-level EES data)
- Add ees_ks4_performance + ees_ks4_info sources, remove ees_ks4 + ees_phonics
- Update int_ks4_with_lineage + fact_ks4_performance with new KS4 columns
- Annual EES DAG: remove stg_ees_phonics+ from selector

Backend:
- models.py: replace all models to point at marts.* tables with schema='marts'
  (DimSchool, DimLocation, KS2Performance, FactOfstedInspection, etc.)
- data_loader.py: rewrite load_school_data_as_dataframe() using raw SQL joining
  dim_school + dim_location + fact_ks2_performance; update get_supplementary_data()
- database.py: remove migration machinery, keep only connection setup
- app.py: remove check_and_migrate_if_needed, remove /api/admin/reimport-ks2
  endpoints (pipeline handles all imports)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:29:27 +00:00
tudor d82e36e7b2 feat(ees): rewrite EES tap and KS2 models for actual data structure
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 31s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Fix publication slugs (KS4, Phonics, Admissions were wrong)
- Split KS2 into two streams: ees_ks2_attainment (long format) and
  ees_ks2_info (wide format context data)
- Target specific filenames instead of keyword matching
- Handle school_urn vs urn column naming
- Pivot KS2 attainment from long to wide format in dbt staging
- Add all ~40 KS2 columns the backend needs (GPS, absence, gender,
  disadvantaged breakdowns, context demographics)
- Pass through all columns in int_ks2_with_lineage and fact_ks2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 23:08:50 +00:00
tudor 719f06e480 fix(pipeline): make total_pupils non-optional for Typesense, add lat/lng to dim_location
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m3s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m29s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Remove optional flag from total_pupils (Typesense requires default
  sorting field to be non-optional)
- Add latitude/longitude columns to dim_location computed from PostGIS
  geom, for direct use by backend and Typesense sync

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:45:02 +00:00
tudor 03256fed41 fix(dbt): add search_path to profile so PostGIS functions resolve in all schemas
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Integrator (push) Has been cancelled
Build and Push Docker Images / Build Kestra Init (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Frontend (Next.js) (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:45:53 +00:00
tudor b7cc01f26f fix(dbt): schema-qualify PostGIS functions in dim_location
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Integrator (push) Has been cancelled
Build and Push Docker Images / Build Kestra Init (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Frontend (Next.js) (push) Has been cancelled
PostGIS extension lives in public schema; marts schema can't resolve
unqualified ST_MakePoint/ST_Transform calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:45:03 +00:00
tudor 28ba2fd0a6 fix(dbt): cast easting/northing to double precision for ST_MakePoint
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m28s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:29:16 +00:00
tudor 54df58746e feat(pipeline): use GIAS easting/northing for all geocoding, drop postcode step
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
GIAS grid references are the actual school location — far more accurate
than postcode centroids. Remove geocode_postcodes.py from the daily DAG
and the postcode-not-null filter from dim_location.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:18:59 +00:00
tudor d3e655abdb fix(dbt): compute geom from easting/northing in dim_location
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m2s
Build and Push Docker Images / Build Kestra Init (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Integrator (push) Has been cancelled
Convert GIAS British National Grid coordinates (EPSG:27700) to WGS84
(EPSG:4326) directly in the dbt model. The geocode script backfills
schools missing easting/northing via Postcodes.io.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:17:08 +00:00
tudor d25e333826 fix(dbt): remove invalid relationship test on map_school_lineage
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Lineage map includes predecessor URNs for closed schools, which are
correctly excluded from dim_school (status = 'Open').

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 20:59:29 +00:00
tudor e7b1ab9f37 fix(pipeline): expand GIAS schema, handle empty strings, scope DAG selectors
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 34s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Declare all 34 columns needed by dbt in GIAS tap schema (target-postgres
  only persists columns present in the Singer schema message)
- Use nullif() for empty-string-to-integer/date casts in staging models
- Scope daily DAG dbt build to GIAS models only (stg_gias_establishments+
  stg_gias_links+) to avoid errors on unloaded sources
- Scope annual EES DAG similarly; remove redundant dbt test steps
- Make dim_school gracefully handle missing int_ofsted_latest table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 20:43:24 +00:00
tudor 97d975114a feat(pipeline): implement parent-view, fbit, idaci Singer taps + align staging/mart models
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m6s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Port extraction logic from integrator scripts into Singer SDK taps:
- tap-uk-parent-view: scrapes Ofsted open data portal, parses survey responses (14 questions)
- tap-uk-fbit: queries FBIT API per-URN with rate limiting, computes per-pupil spend
- tap-uk-idaci: downloads IoD2019 XLSX, batch-resolves postcodes→LSOAs via postcodes.io

Update dbt models to match actual tap output schemas:
- stg_idaci now includes URN (tap does the postcode→LSOA→school join)
- stg_parent_view expanded from 8 to 13 question columns
- fact_deprivation simplified (no longer needs postcode→LSOA join in dbt)
- fact_parent_view expanded to include all 13 question metrics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 10:38:07 +00:00
tudor 8f02b5125e feat(pipeline): add Meltano + dbt + Airflow ELT pipeline scaffold
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Replaces the hand-rolled integrator with a production-grade ELT pipeline
using Meltano (Singer taps), dbt Core (medallion architecture), and
Apache Airflow (orchestration). Adds Typesense for search and PostGIS
for geospatial queries.

- 6 custom Singer taps (GIAS, EES, Ofsted, Parent View, FBIT, IDACI)
- dbt project: 12 staging, 5 intermediate, 12 mart models
- 3 Airflow DAGs (daily/monthly/annual schedules)
- Typesense sync + batch geocoding scripts
- docker-compose: add Airflow, Typesense; upgrade to PostGIS
- Portainer stack definition matching live deployment topology

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 08:37:53 +00:00