GIAS provides 'Not Applicable' (capital A) but the check used 'Not applicable',
so the case-sensitive != matched true and skipped the age-range inference.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Independent schools have phase='Not applicable' in GIAS. Now infer
phase from statutory age range: <=11 → Primary, >=11 → Secondary,
spans both → All-through. Falls back to original value if no age data.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: the UNION ALL query in data_loader.py produced two rows per
all-through school per year (one KS2, one KS4), with drop_duplicates()
silently discarding the KS4 row. Fixes:
- New dbt mart `fact_performance`: FULL OUTER JOIN of fact_ks2_performance
and fact_ks4_performance on (urn, year). One row per school per year.
All-through schools have both KS2 and KS4 columns populated.
- data_loader.py: replace 175-line UNION ALL with a simple JOIN to
fact_performance. No more duplicate rows or drop_duplicates needed.
- sync_typesense.py: single LATERAL JOIN to fact_performance instead of
two separate KS2/KS4 joins.
- app.py: remove drop_duplicates (no longer needed); add PHASE_GROUPS
constant so all-through/middle schools appear in primary and secondary
filter results (were previously invisible to both); scope result_filters
gender/admissions_policies to secondary schools only.
- HomeView.tsx: isSecondaryView is now majority-based (not "any secondary")
and isMixedView shows both sort option sets for mixed result sets.
- school/[slug]/page.tsx: all-through schools route to SchoolDetailView
(renders both SATs + GCSE sections) instead of SecondarySchoolDetailView
(KS4-only). Dedicated SEO metadata for all-through schools.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tap-uk-ees: EESCensusStream now declares 27 data columns (FSM %, EAL %,
ethnicity breakdowns, pupil counts) with clean Singer field names mapped
from the verbose CSV column names (e.g. '% of pupils known to be eligible
for free school meals' → fsm_pct) via a new _column_renames mechanism on
the base stream class.
stg_ees_census: materialised as table, applies safe_numeric to all
percentage/count columns, filters to numeric URNs.
int_pupil_chars_merged + fact_pupil_characteristics: pass all columns
through from staging (previously stubs with only 3 columns).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pipeline:
- EES tap: split KS4 into performance + info streams, fix admissions filename
(SchoolLevel keyword match), fix census filename (yearly suffix), remove
phonics (no school-level data on EES), change endswith → in for matching
- stg_ees_ks4: rewrite to filter long-format data and extract Attainment 8,
Progress 8, EBacc, English/Maths metrics; join KS4 info for context
- stg_ees_admissions: map real CSV columns (total_number_places_offered, etc.)
- stg_ees_census: update source reference, stub with TODO for data columns
- Remove stg_ees_phonics, fact_phonics (no school-level EES data)
- Add ees_ks4_performance + ees_ks4_info sources, remove ees_ks4 + ees_phonics
- Update int_ks4_with_lineage + fact_ks4_performance with new KS4 columns
- Annual EES DAG: remove stg_ees_phonics+ from selector
Backend:
- models.py: replace all models to point at marts.* tables with schema='marts'
(DimSchool, DimLocation, KS2Performance, FactOfstedInspection, etc.)
- data_loader.py: rewrite load_school_data_as_dataframe() using raw SQL joining
dim_school + dim_location + fact_ks2_performance; update get_supplementary_data()
- database.py: remove migration machinery, keep only connection setup
- app.py: remove check_and_migrate_if_needed, remove /api/admin/reimport-ks2
endpoints (pipeline handles all imports)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix publication slugs (KS4, Phonics, Admissions were wrong)
- Split KS2 into two streams: ees_ks2_attainment (long format) and
ees_ks2_info (wide format context data)
- Target specific filenames instead of keyword matching
- Handle school_urn vs urn column naming
- Pivot KS2 attainment from long to wide format in dbt staging
- Add all ~40 KS2 columns the backend needs (GPS, absence, gender,
disadvantaged breakdowns, context demographics)
- Pass through all columns in int_ks2_with_lineage and fact_ks2
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove optional flag from total_pupils (Typesense requires default
sorting field to be non-optional)
- Add latitude/longitude columns to dim_location computed from PostGIS
geom, for direct use by backend and Typesense sync
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PostGIS extension lives in public schema; marts schema can't resolve
unqualified ST_MakePoint/ST_Transform calls.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GIAS grid references are the actual school location — far more accurate
than postcode centroids. Remove geocode_postcodes.py from the daily DAG
and the postcode-not-null filter from dim_location.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert GIAS British National Grid coordinates (EPSG:27700) to WGS84
(EPSG:4326) directly in the dbt model. The geocode script backfills
schools missing easting/northing via Postcodes.io.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Lineage map includes predecessor URNs for closed schools, which are
correctly excluded from dim_school (status = 'Open').
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Declare all 34 columns needed by dbt in GIAS tap schema (target-postgres
only persists columns present in the Singer schema message)
- Use nullif() for empty-string-to-integer/date casts in staging models
- Scope daily DAG dbt build to GIAS models only (stg_gias_establishments+
stg_gias_links+) to avoid errors on unloaded sources
- Scope annual EES DAG similarly; remove redundant dbt test steps
- Make dim_school gracefully handle missing int_ofsted_latest table
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port extraction logic from integrator scripts into Singer SDK taps:
- tap-uk-parent-view: scrapes Ofsted open data portal, parses survey responses (14 questions)
- tap-uk-fbit: queries FBIT API per-URN with rate limiting, computes per-pupil spend
- tap-uk-idaci: downloads IoD2019 XLSX, batch-resolves postcodes→LSOAs via postcodes.io
Update dbt models to match actual tap output schemas:
- stg_idaci now includes URN (tap does the postcode→LSOA→school join)
- stg_parent_view expanded from 8 to 13 question columns
- fact_deprivation simplified (no longer needs postcode→LSOA join in dbt)
- fact_parent_view expanded to include all 13 question metrics
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>