- Backend: replace INNER JOIN ks2 with UNION ALL (ks2 + ks4) so primary
and secondary schools both appear in the main DataFrame
- Backend: add /api/national-averages endpoint computing means from live
data, replacing the hardcoded NATIONAL_AVG constant on the frontend
- Backend: add phase filter param to /api/schools; return phases from
/api/filters; fix hardcoded "phase": "Primary" in school detail endpoint
- Backend: add KS4 metric definitions (Attainment 8, Progress 8, EBacc,
English & Maths pass rates) to METRIC_DEFINITIONS and RANKING_COLUMNS
- Frontend: SchoolDetailView is now phase-aware — secondary schools show
a GCSE Results section (Att8, P8, E&M, EBacc) instead of SATs; phonics
tab hidden for secondary; admissions says Year 7 instead of Year 3;
history table shows KS4 columns; chart datasets switch for secondary
- Frontend: new MetricTooltip component (CSS-only ⓘ icon) backed by
METRIC_EXPLANATIONS — added to RWM, GPS, SEN, EAL, IDACI, progress
scores and all KS4 metrics throughout SchoolDetailView and SchoolCard
- Frontend: METRIC_EXPLANATIONS extended with KS4 terms (Attainment 8,
Progress 8, EBacc) and previously missing terms (SEN, EHCP, EAL, IDACI)
- Frontend: SchoolCard expands "RWM" to "Reading, Writing & Maths" and
shows Attainment 8 / English & Maths Grade 4+ for secondary schools
- Frontend: FilterBar adds Phase dropdown (Primary / Secondary / All-through)
- Frontend: HomeView hero copy updated; compact list shows phase-aware metric
- Global metadata updated to remove "primary only" framing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Switch from dark (#1a1612) to site's warm cream background
- Clear all button now visible as a text button with muted/coral hover
- Remove scroll bar: no max-height cap needed since 5 schools max
- Compare Now button uses coral accent to match primary CTAs
- School items use bg-secondary (beige) consistent with site cards
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- school_data_monthly_parent_view: runs 1st of month, extracts Ofsted
Parent View and builds fact_parent_view
- school_data_annual_idaci: manual trigger, extracts IDACI deprivation
index and builds fact_deprivation
Both tables were missing, causing safe_query to fail and leave the
PostgreSQL transaction in an aborted state, silently killing all
subsequent supplementary data queries including fact_admissions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
formatAcademicYear now handles both 4-digit (2023→2023/24) and 6-digit
EES codes (202526→2025/26). Applied to all year displays: SATs, phonics,
admissions, finances, and the yearly results table.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The string 'NULL' is not SQL NULL, so the WHERE in the renamed CTE
passed those rows through. Filter on the raw value using nullif in the
CTE and on the computed date in the outer SELECT.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Schools in the MI file that have never been inspected have a null
inspection_date after parsing. Exclude them — they are not inspection
records.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use nullif+trim for date cast and safe_numeric for integer grades to
handle literal 'NULL' strings present in the new Report Card format CSV.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The preamble row in Ofsted CSVs contains 'turn off all filters' which
matched 'urn' in line.lower(), so header_idx was set to 0 instead of
the real header row. Use a regex that matches URN only as a CSV field.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove build-integrator and build-kestra-init jobs from Gitea Actions
- Update trigger-deployment needs to only depend on remaining three builds
- Fix school website href to prepend https:// when protocol is missing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tap-uk-ees: EESCensusStream now declares 27 data columns (FSM %, EAL %,
ethnicity breakdowns, pupil counts) with clean Singer field names mapped
from the verbose CSV column names (e.g. '% of pupils known to be eligible
for free school meals' → fsm_pct) via a new _column_renames mechanism on
the base stream class.
stg_ees_census: materialised as table, applies safe_numeric to all
percentage/count columns, filters to numeric URNs.
int_pupil_chars_merged + fact_pupil_characteristics: pass all columns
through from staging (previously stubs with only 3 columns).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sync_typesense.py:
- Fix query string replacement: was matching 'ST_X(l.geom) as lng' but
QUERY_BASE uses 'l.longitude as lng' — KS2/KS4 lateral joins were
silently dropped on every sync run
backend:
- Add typesense_url/typesense_api_key settings to config.py
- Add search_schools_typesense() to data_loader.py — queries Typesense
'schools' alias, returns URNs in relevance order with typo tolerance;
falls back to empty list if Typesense is unavailable
- /api/schools: replace pandas str.contains with Typesense search;
results are filtered from the DataFrame and returned in relevance order;
graceful fallback to substring match if Typesense is down
requirements.txt: add typesense==0.21.0, numpy==1.26.4
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tap-uk-ofsted schema only declares OEIF columns; rc_* (Report Card)
columns were never emitted so they don't exist in raw.ofsted_inspections.
Replace column references with NULL::text until the actual CSV column
names for the post-Nov 2025 Report Card framework are confirmed and
added to the tap schema.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- int_ks2_with_lineage: use DISTINCT ON (current_urn, year) in predecessor_ks2
to handle schools with multiple predecessors that both have KS2 data for the
same year (e.g. two schools that merged). Keeps the predecessor with most pupils.
- dbt_project.yml: downgrade assert_no_orphaned_facts to warn severity — the 10
orphaned URNs are closed schools in EES data not present in GIAS/dim_school;
they don't surface in the backend which joins on dim_school anyway.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Filter school_urn/time_period to '^[0-9]+$' to exclude "n/a" and other
non-numeric values that caused integer cast failures in fact_admissions
- Add trim() to all school_urn/time_period casts to prevent whitespace
variants producing duplicate urn+year rows in fact_ks2_performance
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previous version scanned ees_ks2_attainment (1.2M rows) 5 times via
separate CTEs (all_pupils, gender_boys, gender_girls, disadv, not_disadv)
plus 5 LEFT JOINs. Rewritten as one GROUP BY with conditional aggregation
— single scan, no self-joins.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
KS2 attainment has 1.2M rows in long format. As a view, the pivot was
re-executed inline for every downstream model (intermediate → fact),
causing fact_ks2_performance CREATE TABLE to run for 18+ minutes.
Materializing as tables means the pivot runs once during staging, and
downstream models read from a pre-computed ~16k-row result.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace nullif(col, 'z') casts with safe_numeric macro across KS2, KS4,
and admissions staging models. The regex-based macro treats any non-numeric
string (z, c, x, q, u, etc.) as NULL without needing an explicit list.
Also fix FSM_eligible_percent column quoting in stg_ees_admissions — target-
postgres stores mixed-case column names quoted, so unquoted references were
being folded to fsm_eligible_percent by PostgreSQL.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The admissions school-level file contains some rows with null school_urn
(LA/category aggregates that survive the geographic_level filter). These
cause a not-null constraint violation at target-postgres. Drop any row
where the URN column is null or empty before yielding records.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Admissions file is UTF-8 with BOM, not Latin-1. Reading as latin-1
decoded the BOM bytes as '' which wasn't stripped. Change admissions
encoding to utf-8-sig (strips BOM automatically). Also update the manual
BOM strip fallback to handle the latin-1 decoded form.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Some DfE supporting-files CSVs have a UTF-8 BOM on the first column,
causing it to be named '\ufefftime_period' instead of 'time_period'.
This trips Singer schema validation ('time_period' is a required property).
Strip the BOM from all column names after read_csv.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DfE supporting-files CSVs (spc_school_level_underlying_data, AppsandOffers
SchoolLevel) are Latin-1 encoded. Add _encoding class attribute to base
stream class and override to 'latin-1' for census and admissions streams.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pipeline:
- EES tap: split KS4 into performance + info streams, fix admissions filename
(SchoolLevel keyword match), fix census filename (yearly suffix), remove
phonics (no school-level data on EES), change endswith → in for matching
- stg_ees_ks4: rewrite to filter long-format data and extract Attainment 8,
Progress 8, EBacc, English/Maths metrics; join KS4 info for context
- stg_ees_admissions: map real CSV columns (total_number_places_offered, etc.)
- stg_ees_census: update source reference, stub with TODO for data columns
- Remove stg_ees_phonics, fact_phonics (no school-level EES data)
- Add ees_ks4_performance + ees_ks4_info sources, remove ees_ks4 + ees_phonics
- Update int_ks4_with_lineage + fact_ks4_performance with new KS4 columns
- Annual EES DAG: remove stg_ees_phonics+ from selector
Backend:
- models.py: replace all models to point at marts.* tables with schema='marts'
(DimSchool, DimLocation, KS2Performance, FactOfstedInspection, etc.)
- data_loader.py: rewrite load_school_data_as_dataframe() using raw SQL joining
dim_school + dim_location + fact_ks2_performance; update get_supplementary_data()
- database.py: remove migration machinery, keep only connection setup
- app.py: remove check_and_migrate_if_needed, remove /api/admin/reimport-ks2
endpoints (pipeline handles all imports)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix publication slugs (KS4, Phonics, Admissions were wrong)
- Split KS2 into two streams: ees_ks2_attainment (long format) and
ees_ks2_info (wide format context data)
- Target specific filenames instead of keyword matching
- Handle school_urn vs urn column naming
- Pivot KS2 attainment from long to wide format in dbt staging
- Add all ~40 KS2 columns the backend needs (GPS, absence, gender,
disadvantaged breakdowns, context demographics)
- Pass through all columns in int_ks2_with_lineage and fact_ks2
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove optional flag from total_pupils (Typesense requires default
sorting field to be non-optional)
- Add latitude/longitude columns to dim_location computed from PostGIS
geom, for direct use by backend and Typesense sync
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Typesense requires numeric default_sorting_field — use total_pupils
- Dynamically include KS2/KS4 joins only if those tables exist
- Extract lat/lng from PostGIS geom and populate Typesense geopoint field
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The postgis/postgis image auto-enables PostGIS on fresh database creation.
No need to do it from airflow-init.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PostGIS extension lives in public schema; marts schema can't resolve
unqualified ST_MakePoint/ST_Transform calls.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When tasks are removed from a DAG, old serialized metadata in the DB
causes 'Task not found' errors. Delete all DAGs before reserializing
on each deploy to ensure a clean state.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GIAS grid references are the actual school location — far more accurate
than postcode centroids. Remove geocode_postcodes.py from the daily DAG
and the postcode-not-null filter from dim_location.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert GIAS British National Grid coordinates (EPSG:27700) to WGS84
(EPSG:4326) directly in the dbt model. The geocode script backfills
schools missing easting/northing via Postcodes.io.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dbt default prepends the profile schema as prefix (public_staging,
public_marts). Override to use custom schema names directly (staging,
marts) so scripts can reference marts.dim_location correctly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Lineage map includes predecessor URNs for closed schools, which are
correctly excluded from dim_school (status = 'Open').
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GIAS CSV dates are DD-MM-YYYY format — use to_date() instead of cast().
Exclude int_ks2_with_lineage+ and int_ks4_with_lineage+ from daily DAG
selector since they depend on EES data not yet loaded.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Declare all 34 columns needed by dbt in GIAS tap schema (target-postgres
only persists columns present in the Singer schema message)
- Use nullif() for empty-string-to-integer/date casts in staging models
- Scope daily DAG dbt build to GIAS models only (stg_gias_establishments+
stg_gias_links+) to avoid errors on unloaded sources
- Scope annual EES DAG similarly; remove redundant dbt test steps
- Make dim_school gracefully handle missing int_ofsted_latest table
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GIAS tap emits uppercase URN column — add quote: true so dbt source tests
reference "URN" instead of urn. Remove source-level tests from tables not yet
loaded (ofsted, ees, parent_view, fbit, idaci) to prevent relation-not-found
errors during dbt build.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Airflow 2.x env vars (CORE__SECRET_KEY, CORE__INTERNAL_API_URL) with
correct Airflow 3.x equivalents (API_AUTH__JWT_SECRET, API_AUTH__JWT_ISSUER,
CORE__EXECUTION_API_SERVER_URL) on all three Airflow services.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With separate containers, task workers in the scheduler need the
api-server's address for the Execution API. Defaults to localhost:8080
which fails across containers. Set INTERNAL_API_URL to the api-server's
Docker service name.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Running both in one container caused JWT secret key race conditions.
Separate containers with the same AIRFLOW__CORE__SECRET_KEY env var
ensures both processes use identical JWT signing keys. Shared
airflow_logs volume allows the api-server to read task logs written
by the scheduler.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The init container and airflow container have separate filesystems, so
airflow.cfg generated by db migrate is not available to the scheduler/
api-server. Without a config file, both processes race to generate
their own with different random JWT secret keys.
Fix by:
1. Running `airflow config list` first to generate airflow.cfg once
2. Setting a fixed SECRET_KEY via env var (>= 64 bytes for SHA512)
3. Adding sleep 3 so scheduler writes config before api-server starts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deleting airflow.cfg at container start caused the scheduler and
api-server to each generate their own random JWT secret key, leading
to 'Signature verification failed' when task workers communicated
with the api-server. Let both processes share the config file
generated by db migrate (env vars still override where needed).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When scheduler and api-server run in the same container, both generate
independent JWT signing keys on startup. The scheduler's task workers
then fail with 'Invalid auth token: Signature verification failed'
when communicating with the api-server. Fix by setting a shared
INTERNAL_API_SECRET_KEY via env var.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>