Frontend
- Dynamic-import Chart.js components on detail/compare views so Chart.js
no longer ships in initial JS.
- Drop force-dynamic on home, compare, rankings so internal data fetches
reuse Next.js's per-call revalidate cache.
- Switch /school/[slug] to ISR with a 7-day revalidate window (school
data updates annually).
- Preconnect to analytics + postcodes.io; remove redundant defer on the
Umami Script tag (afterInteractive already covers it).
- Bump images.minimumCacheTTL to 1 year.
- Extract HowItWorks and Editorial sections as server components passed
to HomeView via slot props so their JSX stays out of the client bundle.
Backend
- Add GZipMiddleware (min 512 bytes).
- Add CacheAndETagMiddleware: per-path Cache-Control with long s-maxage
+ stale-while-revalidate, ETag generation, and 304 on If-None-Match.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The /api/rankings endpoint returned each row keyed by the metric's
column name (e.g. rwm_high_pct) but never under a generic `value`
field. The frontend RankingItem type and RankingsView both read
ranking.value, so every row rendered "—" for every metric — the
default rwm_expected_pct included.
Add `df["value"] = df[metric]` before JSON serialisation so the
frontend gets the value it has always expected. The raw metric
column is still in the row for any caller that wants it explicitly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Upgrades the existing "Pupils" stat to include a compact split bar and
percentage hint for mixed schools (single-sex schools already carry a
"Boys's/Girls's school" badge, so the split would be redundant).
Wires fact_pupil_characteristics into the API: new SQLAlchemy model and
a real census block in /api/schools/{urn} replacing the prior null stub.
On the primary detail page the inline "Pupils: 241" text is replaced by
a richer block (display number + bar + "52% girls · 48% boys"). On the
secondary detail page the existing "Total pupils" hero stat card grows
the bar and hint beneath the number. Both fall back to the previous
text-only rendering when census gender data is missing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
P1 (backend/data_loader.py): Add load_latest_school_data() which pre-computes
the one-row-per-school latest-year snapshot (groupby, prev-year trend merge)
once at startup instead of on every /api/schools request. get_schools route
now starts from the cached snapshot rather than rebuilding it.
S3 (backend/app.py): Wrap synchronous geocode_single_postcode() call in
asyncio.to_thread() so postcode lookups no longer block the uvicorn event
loop. Admin reload endpoint also uses to_thread for both cache primes.
P2 (nextjs-app/components/HomeView.tsx): Add mapParamsRef guard so switching
back to map view does not re-fetch 500 schools when search params haven't
changed. Reset ref on new searches so fresh data is always fetched.
P3 (nextjs-app/lib/chartSetup.ts): Extract Chart.js registration into a
shared side-effect module. ComparisonChart and PerformanceChart now import
it instead of each calling ChartJS.register() independently.
P4 (backend/database.py): Remove unnecessary db.commit() from the read-only
get_db_session() context manager — saves a DB round-trip on every request.
P5 (backend/database.py): Add pool_recycle=1800 to SQLAlchemy engine to
prevent stale TCP connections from accumulating in long-running processes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The staging model aliased EES's total_number_places_offered column as
published_admission_number, but PAN is the school's published capacity
(not exposed by EES at school level) — what we actually have is the
count of places offered in a given admissions round. The misnomer
propagated to the mart, SQLAlchemy model, API response, TS types, and
UI copy ("places per year", "(PAN)").
Rename end-to-end and fix the UI labels:
- "29 places for 42 first-choice applications"
→ "29 places offered for 42 first-choice applications"
- "Reception/Year 7 places per year"
→ "Reception/Year 7 places offered"
- drop the misleading "(PAN)" suffix in the secondary view
Also add a comment in stg_ees_admissions clarifying this is the number
of places offered, not PAN. Requires dbt to rebuild fact_admissions
(marts are materialized as tables) before the backend can start.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Redesign the School Details page for better parent comprehension:
- New SatsChart component: horizontal cascade bars with ruler scale and
national average marker (teal/coral palette matching site theme)
- Admissions section: visual progress bar showing 1st-preference demand
vs available places, colour-coded by oversubscription status
- Historical data: collapse raw year-by-year table behind a disclosure
element while keeping the performance line chart always visible
- EAL metric: add national average comparison via DeltaChip (backend now
includes eal_pct in national averages endpoint)
- New formatWithSuppression utility for null/suppressed data handling
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dim_school.sql was checking for int_ofsted_latest in target.schema (wrong schema)
due to the custom generate_schema_name macro using literal schema names. The
model lives in 'intermediate', so ofsted_grade/date/framework were always NULL
in dim_school, causing all list cards to show 'Not yet inspected'.
Fix 1: data_loader.py joins marts.fact_ofsted_inspection with DISTINCT ON to
get latest inspection per school — no pipeline re-run needed.
Fix 2: dim_school.sql uses schema='intermediate' so future dbt runs correctly
denormalise the Ofsted summary into dim_school.
The school_info object was missing total_pupils entirely, so the frontend
always fell back to the KS4 exam cohort from yearly_data. Now selects
s.total_pupils (GIAS NumberOfPupils — full school roll) as gias_total_pupils
in the main query and exposes it as total_pupils on school_info.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces computed means from our school dataset with the published DfE
national headline figures for the KS2 chart reference line.
- tap-uk-ees: new EESKs2NationalStream fetches the stable EES data-catalogue
CSV (one row per year, England national total, AllSchools filter)
- dbt staging: stg_ees_ks2_national normalises columns, casts to float,
filters to years >= 201617
- dbt mart: fact_ks2_national_averages — one row per year, official figures
- backend/models: Ks2NationalAverage SQLAlchemy model
- backend/app: /api/national-averages queries the mart for KS2 by_year;
secondary by_year stays computed (no DfE KS4 national dataset yet)
- DAG: extract_ks2_national task added to school_data_annual_ees,
runs in parallel with the main EES extract
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the dashed reference line was a flat horizontal at the latest
year's national average across all historical data, implying the national
figure was constant. Now the backend returns per-year averages in `by_year`
and the chart maps each data year to its own national average, so the
reference line correctly reflects how the national picture changed over time
(including COVID recovery dip/recovery).
- backend: /api/national-averages now includes `by_year` list alongside
existing `year`/`primary`/`secondary` latest-year snapshot
- types: NationalAverages extended with `by_year: NationalAveragesYear[]`
- PerformanceChart: accepts `nationalByYear` prop; builds per-year series
aligned to school data years, falling back to scalar prop if absent
- SchoolDetailView + SecondarySchoolDetailView: pass `nationalAvg.by_year`
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The staging model was mapping EES column ``proportion_1stprefs_v_totaloffers``
straight onto ``first_preference_offer_pct``. That raw column is not a
percentage — it is a ratio of first-preference applications to total offers
(an oversubscription indicator, >1 means oversubscribed), so OLQH rendered
as "1%" when the true first-choice success rate is 27/42 = 64%.
The frontend display code is not at fault and is not patched here —
data-quality issues must be fixed at the source.
- stg_ees_admissions: compute ``first_preference_offer_pct`` as
``100 * number_1st_preference_offers / times_put_as_1st_preference`` —
of families who listed this school first, the % that received an offer
(0–100). Guard against divide-by-zero.
- stg_ees_admissions: expose the legitimate EES ratio as the new column
``oversubscription_ratio`` (1st-preference applications per place) for
future use, clearly named.
- fact_admissions, FactAdmissions model, data_loader: propagate the new
``oversubscription_ratio`` column.
- SchoolAdmissions type: document both columns inline.
- buildSchoolSummary: reword the oversubscription clause so it reads
sensibly across the whole 0–100 range (no more "just 64%").
- Hero chip subtitle: clearer phrasing "X% of first-choice applicants
offered a place".
Requires a dbt run of stg_ees_admissions and fact_admissions on deploy
so the new column materialises.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expand the abbreviation in metric names (backend schemas), the home page
sort dropdown, README/QA docs, and pipeline comments. Short_name fields
and the compact row/map-card labels remain abbreviated for space.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: the UNION ALL query in data_loader.py produced two rows per
all-through school per year (one KS2, one KS4), with drop_duplicates()
silently discarding the KS4 row. Fixes:
- New dbt mart `fact_performance`: FULL OUTER JOIN of fact_ks2_performance
and fact_ks4_performance on (urn, year). One row per school per year.
All-through schools have both KS2 and KS4 columns populated.
- data_loader.py: replace 175-line UNION ALL with a simple JOIN to
fact_performance. No more duplicate rows or drop_duplicates needed.
- sync_typesense.py: single LATERAL JOIN to fact_performance instead of
two separate KS2/KS4 joins.
- app.py: remove drop_duplicates (no longer needed); add PHASE_GROUPS
constant so all-through/middle schools appear in primary and secondary
filter results (were previously invisible to both); scope result_filters
gender/admissions_policies to secondary schools only.
- HomeView.tsx: isSecondaryView is now majority-based (not "any secondary")
and isMixedView shows both sort option sets for mixed result sets.
- school/[slug]/page.tsx: all-through schools route to SchoolDetailView
(renders both SATs + GCSE sections) instead of SecondarySchoolDetailView
(KS4-only). Dedicated SEO metadata for all-through schools.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove 10-mile radius option; cap backend radius max at 5 miles
- Raise backend page_size max to 500 so map can fetch all schools in one call
- HomeView: when map view is active, fetch all schools within radius
(page_size=500) instead of showing only the paginated first page;
falls back to initial SSR schools while loading
- SchoolMap/LeafletMapInner: accept referencePoint prop and render a
distinctive coral circle pin at the search postcode location
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Load-more requests read URL params (postcode, radius, etc.) but page_size
is never in the URL — it's hardcoded in page.tsx. Without it the backend
received page_size=None, hit a TypeError on (page-1)*None, returned 500,
and the silent catch left the user stuck on page 1.
In a dense area (e.g. Wimbledon SW19) 50 schools fit within ~1.8 miles,
so page 1 never shows anything beyond that regardless of selected radius.
Fix:
- Backend: give page_size a safe default of 25 instead of None
- Frontend: explicitly pass initialSchools.page_size in load-more params
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Backend builds sitemap.xml from school data at startup (in-memory)
- POST /api/admin/regenerate-sitemap refreshes it after data updates
- New Airflow DAG (sitemap_generate) runs Sundays 05:00 and calls the endpoint
- Next.js proxies /sitemap.xml to the backend; removes the slow dynamic sitemap.ts
- docker-compose passes BACKEND_URL + ADMIN_API_KEY to Airflow env
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. Simpler home page: only search box on landing, no filter dropdowns
2. Advanced filters: hidden behind toggle on results page, auto-open if active
3. Per-school phase rendering: each row renders based on its own data
4. Taller 4-line rows with context line (type, age range, denomination, gender)
5. Result-scoped filters: dropdown values reflect current search results
6. Fix blank filter values: exclude empty strings and "Not applicable"
7. Rankings: Primary/Secondary phase tabs with phase-specific metrics
8. Compare: Primary/Secondary tabs with school counts and phase metrics
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Backend: replace INNER JOIN ks2 with UNION ALL (ks2 + ks4) so primary
and secondary schools both appear in the main DataFrame
- Backend: add /api/national-averages endpoint computing means from live
data, replacing the hardcoded NATIONAL_AVG constant on the frontend
- Backend: add phase filter param to /api/schools; return phases from
/api/filters; fix hardcoded "phase": "Primary" in school detail endpoint
- Backend: add KS4 metric definitions (Attainment 8, Progress 8, EBacc,
English & Maths pass rates) to METRIC_DEFINITIONS and RANKING_COLUMNS
- Frontend: SchoolDetailView is now phase-aware — secondary schools show
a GCSE Results section (Att8, P8, E&M, EBacc) instead of SATs; phonics
tab hidden for secondary; admissions says Year 7 instead of Year 3;
history table shows KS4 columns; chart datasets switch for secondary
- Frontend: new MetricTooltip component (CSS-only ⓘ icon) backed by
METRIC_EXPLANATIONS — added to RWM, GPS, SEN, EAL, IDACI, progress
scores and all KS4 metrics throughout SchoolDetailView and SchoolCard
- Frontend: METRIC_EXPLANATIONS extended with KS4 terms (Attainment 8,
Progress 8, EBacc) and previously missing terms (SEN, EHCP, EAL, IDACI)
- Frontend: SchoolCard expands "RWM" to "Reading, Writing & Maths" and
shows Attainment 8 / English & Maths Grade 4+ for secondary schools
- Frontend: FilterBar adds Phase dropdown (Primary / Secondary / All-through)
- Frontend: HomeView hero copy updated; compact list shows phase-aware metric
- Global metadata updated to remove "primary only" framing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sync_typesense.py:
- Fix query string replacement: was matching 'ST_X(l.geom) as lng' but
QUERY_BASE uses 'l.longitude as lng' — KS2/KS4 lateral joins were
silently dropped on every sync run
backend:
- Add typesense_url/typesense_api_key settings to config.py
- Add search_schools_typesense() to data_loader.py — queries Typesense
'schools' alias, returns URNs in relevance order with typo tolerance;
falls back to empty list if Typesense is unavailable
- /api/schools: replace pandas str.contains with Typesense search;
results are filtered from the DataFrame and returned in relevance order;
graceful fallback to substring match if Typesense is down
requirements.txt: add typesense==0.21.0, numpy==1.26.4
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pipeline:
- EES tap: split KS4 into performance + info streams, fix admissions filename
(SchoolLevel keyword match), fix census filename (yearly suffix), remove
phonics (no school-level data on EES), change endswith → in for matching
- stg_ees_ks4: rewrite to filter long-format data and extract Attainment 8,
Progress 8, EBacc, English/Maths metrics; join KS4 info for context
- stg_ees_admissions: map real CSV columns (total_number_places_offered, etc.)
- stg_ees_census: update source reference, stub with TODO for data columns
- Remove stg_ees_phonics, fact_phonics (no school-level EES data)
- Add ees_ks4_performance + ees_ks4_info sources, remove ees_ks4 + ees_phonics
- Update int_ks4_with_lineage + fact_ks4_performance with new KS4 columns
- Annual EES DAG: remove stg_ees_phonics+ from selector
Backend:
- models.py: replace all models to point at marts.* tables with schema='marts'
(DimSchool, DimLocation, KS2Performance, FactOfstedInspection, etc.)
- data_loader.py: rewrite load_school_data_as_dataframe() using raw SQL joining
dim_school + dim_location + fact_ks2_performance; update get_supplementary_data()
- database.py: remove migration machinery, keep only connection setup
- app.py: remove check_and_migrate_if_needed, remove /api/admin/reimport-ks2
endpoints (pipeline handles all imports)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The v4 migration already ran before _apply_schema_alterations() was added,
so the new ofsted_inspections columns were never created. Bump to v5 so the
next backend restart re-runs the migration and applies the ALTER TABLE statements.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
create_all() only creates missing tables; it won't modify tables that already
exist from an older schema version. Add _apply_schema_alterations() which runs
idempotent ADD COLUMN IF NOT EXISTS statements after every migration so
supplementary tables (like ofsted_inspections) gain new columns without
dropping their existing data.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ofsted replaced single overall grades with Report Cards from Nov 2025.
Both systems are retained during the transition period.
- DB: new framework + 9 RC columns on ofsted_inspections (schema v4)
- Integrator: auto-detect OEIF vs Report Card from CSV column headers;
parse 5-level RC grades and safeguarding met/not-met
- API: expose all new fields in the ofsted response dict
- Frontend: branch on framework='ReportCard' to show safeguarding badge
+ 8-category grid; fall back to legacy OEIF layout otherwise;
always show inspection date in both layouts
- CSS: rcGrade1–5 and safeguardingMet/NotMet classes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two issues caused the backend to drop and reimport school data on restart:
1. schema_version table was in the drop list inside run_full_migration(),
so after any migration the breadcrumb was destroyed and the next
restart would see no version → re-trigger migration
2. Schema version was set after migration, so a crash mid-migration
left no version → infinite re-migration loop
Fix: remove schema_version from the drop list, and set the version
before running migration so crashes don't cause loops.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The EES statistics API only exposes ~13 publications; admissions data is not
among them. Switch to the EES content API (content.explore-education-statistics.
service.gov.uk) which covers all publications.
- ees.py: add get_content_release_id() and download_release_zip_csv() that
fetch the release ZIP and extract a named CSV member from it
- admissions.py: use corrected slug (primary-and-secondary-school-applications-
and-offers), correct column names from actual CSV (school_urn,
total_number_places_offered, times_put_as_1st_preference, etc.), derive
first_preference_offers_pct from offer/application ratio, filter to primary
schools only, keep most recent year per URN
Also includes SchoolDetailView UX redesign: parent-first section ordering,
plain-English labels, national average benchmarks, progress score colour
coding, expanded header, quick summary strip, and CSS consolidation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Before dropping tables, save all existing lat/lon coordinates keyed by URN.
After reimport, merge cached coordinates with any newly geocoded ones so
schools that already have coordinates skip the postcodes.io API call.
This makes repeated reimports fast and avoids re-geocoding ~15k schools.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The geocoding pass over ~15k schools takes longer than any reasonable
HTTP timeout. New approach:
- POST /api/admin/reimport-ks2 starts migration in background thread,
returns {"status":"started"} immediately
- GET /api/admin/reimport-ks2/status returns {running, done}
- ks2.py polls status every 30s (max 2h) before returning
- Kestra flow timeout bumped to PT2H
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add geocode query param to /api/admin/reimport-ks2 (defaults true).
ks2.py passes ?geocode=true so postcodes are resolved to lat/lng in
the same migration pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- backend: POST /api/admin/reimport-ks2 runs full CSV migration in a thread
- backend/docker-compose: ADMIN_API_KEY env var (default: changeme) so the
key is stable across restarts and the integrator can call the endpoint
- integrator: sources/ks2.py triggers the backend endpoint (900s timeout)
- integrator: flows/ks2.yml Kestra flow (manual trigger, no schedule)
To re-ingest after a DB wipe: trigger the ks2-reimport flow from the
Kestra UI at http://localhost:8080.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a full data integration pipeline for enriching school profiles with
supplementary data from Ofsted, GIAS, EES, IDACI, and FBIT.
Backend:
- Bump SCHEMA_VERSION to 3; add 8 new DB tables (ofsted_inspections,
ofsted_parent_view, school_census, admissions, sen_detail, phonics,
school_deprivation, school_finance) plus GIAS columns on schools
- Expose all supplementary data via GET /api/schools/{urn}
- Enrich school list responses with ofsted_grade + ofsted_date
Integrator (new service):
- FastAPI HTTP microservice; Kestra calls POST /run/{source}
- 9 source modules: ofsted, gias, parent_view, census, admissions,
sen_detail, phonics, idaci, finance
- 9 Kestra flow YAMLs with scheduled triggers and 3× retry
Frontend:
- SchoolRow: colour-coded Ofsted badge (Outstanding/Good/RI/Inadequate)
- SchoolDetailView: 7 new sections — Ofsted sub-judgements, Parent View
survey bars, Admissions, Pupils & Inclusion / SEN, Phonics, Deprivation
Context, Finances
- types.ts: 8 new interfaces + extended School/SchoolDetailsResponse
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The frontend expects location_info with coordinates array, but backend was
returning search_location with lat/lng keys. This fix enables the map toggle
to appear for location-based searches.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
On startup, the app now checks if the database schema version matches
the code. If there's a mismatch or no version exists, it automatically
runs a full data migration before starting.
- Add backend/version.py with SCHEMA_VERSION constant
- Add backend/migration.py with extracted migration logic
- Add SchemaVersion model to track DB version
- Add version check functions to database.py
- Update app.py lifespan to use check_and_migrate_if_needed()
- Simplify migrate_csv_to_db.py to use shared logic
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Display test absence percentages (reading, maths, GPS, writing, science)
in a new section in the school modal. Requires database re-import.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When searching by location, users can now toggle between list view
(school cards grid) and a split map view showing:
- Interactive map on left with all school markers
- Scrollable school list on right
- Blue marker for search location, default markers for schools
- Clicking a marker highlights and scrolls to the corresponding card
Mobile responsive with stacked layout on smaller screens.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace footer note with a contact form that emails contact@schoolcompare.co.uk
via FormSubmit.co. Keep only the data source attribution. Update CSP to allow
form submissions to FormSubmit.co and add responsive styling for the form.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add runtime normalization of cryptic school type codes to user-friendly names
(e.g., AC/ACC/ACCS -> "Academy", CY/CYS -> "Community")
- Update SCHOOL_TYPE_MAP in schemas.py with consolidated mappings
- Add normalize_school_type() and get_school_type_codes_for_filter() helpers
- Persist selected schools in localStorage across page refreshes
- Move "Add to Compare" button from modal footer to header
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add GA4 measurement ID to config (default: G-J0PCVT14NY)
- Add /api/config endpoint to expose GA ID to frontend
- Update cookie consent with Analytics category (opt-in)
- Load GA4 only after user consents to analytics cookies
- Update CSP to allow Google Analytics domains
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use direct bracket indexing instead of .get() for pandas Series
row access in calc_distance function to ensure scalar values
are returned for pd.isna() checks.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Display RWM Higher % alongside RWM Expected % on school cards
- Add trend indicators (up/down/stable arrows) showing year-over-year change
- Backend calculates previous year's RWM for trend comparison
- Trend appears on cards and in school detail modal
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>