docker/setup-buildx-action creates a BuildKit builder that ignores
the host daemon's registry-mirrors setting. Configure buildkitd inline
to route docker.io pulls through the local pull-through cache at
172.17.0.1:6000 (Docker bridge gateway → host port 6000).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CSV is read with dtype=str so all values arrive as strings. Declaring
LA (code) and EstablishmentNumber as IntegerType caused schema
validation failures in target-postgres. Use StringType for all columns
except URN (which is explicitly cast to int for the primary key).
Type casting happens in dbt staging models.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Meltano 4.x requires an environment to be specified. Set production as
the default. Also remove the deprecated 'version: 2' field.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The meltanolabs target-postgres variant expects 'database' as the
config key, not 'dbname' (which was the pipelinewise variant's key).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `catalog` capability forced Meltano to run --discover and generate
a catalog file (tap.properties.json) before each extraction. This fails
because our Singer SDK taps emit schemas inline and don't need external
catalog files. Removing the capability makes Meltano invoke taps
directly without catalog generation.
Also switch from deprecated `meltano elt` to `meltano run` for
Meltano 4.x compatibility.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Meltano elt requires catalog files (tap.properties.json) to exist.
These are generated by `meltano install` which discovers tap schemas
and installs the target-postgres loader. Without this step, `meltano
elt` fails with "catalog file is missing".
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With LocalExecutor, tasks run in the scheduler process and logs are
written locally. Running api-server and scheduler in separate containers
meant the api-server couldn't read task logs (empty hostname in log
fetch URL). Combining them into one container eliminates the issue —
logs are always on the local filesystem.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
airflow db migrate generates airflow.cfg with default values that
shadow our env vars (DAGS_FOLDER, WORKER_LOG_SERVER_HOST, etc).
Delete the generated config file before starting each service so
Airflow falls through to env var configuration exclusively.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The scheduler's log server binds to [::]:8793 but doesn't advertise a
hostname, so the api-server gets 'http://:8793/...' (no host) when
fetching task logs. Fix by setting the scheduler's hostname and
configuring WORKER_LOG_SERVER_HOST so the api-server can reach it.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The api-server couldn't fetch task logs because LocalExecutor runs tasks
in the scheduler process, writing logs to its local filesystem. The
api-server tried to fetch via HTTP but the scheduler's log server had
no hostname set. Fix by sharing a named volume for logs between both
containers so the api-server reads logs directly from the filesystem.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port critical patterns from the working integrator into Singer taps:
- GIAS: add 404 fallback to yesterday's date, increase timeout to 300s,
use latin-1 encoding, use dated URL for links (static URL returns 500)
- FBIT: add GIAS date fallback, increase timeout, fix encoding to latin-1
- IDACI: use dated GIAS URL with fallback instead of undated static URL,
fix encoding to latin-1, increase timeout to 300s
- Ofsted: try utf-8-sig then fall back to latin-1 encoding
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add AIRFLOW__CORE__DAGS_FOLDER env var in Dockerfile so it's always set
- Run `airflow dags reserialize` after `db migrate` in init container so
DAGs appear immediately without waiting for scheduler scan interval
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- MELTANO_BIN/DBT_BIN pointed to .venv/bin/ but Dockerfile installs globally
- Add try/except for BashOperator import to handle both Airflow 3 provider
path and legacy path, preventing silent DAG import failures
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port extraction logic from integrator scripts into Singer SDK taps:
- tap-uk-parent-view: scrapes Ofsted open data portal, parses survey responses (14 questions)
- tap-uk-fbit: queries FBIT API per-URN with rate limiting, computes per-pupil spend
- tap-uk-idaci: downloads IoD2019 XLSX, batch-resolves postcodes→LSOAs via postcodes.io
Update dbt models to match actual tap output schemas:
- stg_idaci now includes URN (tap does the postcode→LSOA→school join)
- stg_parent_view expanded from 8 to 13 question columns
- fact_deprivation simplified (no longer needs postcode→LSOA join in dbt)
- fact_parent_view expanded to include all 13 question metrics
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The named volume was shadowing the DAGs built into the pipeline image
with an empty directory. DAGs now served directly from the image and
update on each CI build.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Typesense image has neither curl nor wget. Use bash /dev/tcp for a
simple port connectivity check instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Airflow 3 replaced `airflow webserver` with `airflow api-server` and
removed the `airflow users` CLI. Auth is now via SimpleAuthManager
configured through AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_USERS env var.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The MI CSV contains both OEIF and RC column sets simultaneously — OEIF columns
are populated for older inspections, RC columns for post-Nov-2025 inspections.
File-level detection wrongly classified all schools based on column presence alone.
Replace _detect_framework(df) with _framework_for_row(row):
- ReportCard: any rc_* column has a value
- OEIF: overall_effectiveness or quality_of_education has a value
- None: neither has data (no graded inspection on record)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The old OEIF CSV contains columns whose names include substrings like
'inclusion' and 'achievement', causing _detect_framework() to wrongly return
'ReportCard' for pre-Nov-2025 inspections.
Fix: check for OEIF-specific phrases first ('overall effectiveness', 'quality
of education', 'behaviour and attitudes'). Only if none are found, look for
multi-word RC-specific phrases. Default to OEIF as a safe fallback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The v4 migration already ran before _apply_schema_alterations() was added,
so the new ofsted_inspections columns were never created. Bump to v5 so the
next backend restart re-runs the migration and applies the ALTER TABLE statements.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
create_all() only creates missing tables; it won't modify tables that already
exist from an older schema version. Add _apply_schema_alterations() which runs
idempotent ADD COLUMN IF NOT EXISTS statements after every migration so
supplementary tables (like ofsted_inspections) gain new columns without
dropping their existing data.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ofsted replaced single overall grades with Report Cards from Nov 2025.
Both systems are retained during the transition period.
- DB: new framework + 9 RC columns on ofsted_inspections (schema v4)
- Integrator: auto-detect OEIF vs Report Card from CSV column headers;
parse 5-level RC grades and safeguarding met/not-met
- API: expose all new fields in the ofsted response dict
- Frontend: branch on framework='ReportCard' to show safeguarding badge
+ 8-category grid; fall back to legacy OEIF layout otherwise;
always show inspection date in both layouts
- CSS: rcGrade1–5 and safeguardingMet/NotMet classes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove standalone back button div (looked out of place)
- Back button now lives in the sticky section nav bar, styled as a
bordered pill with coral accent — consistent with page design
- Fix sticky nav top offset from 0 to 3rem so it sticks below the
site-wide header instead of sliding behind it
- Increase scroll-margin-top on cards to 6rem to account for both
site header and section nav height
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two issues caused the backend to drop and reimport school data on restart:
1. schema_version table was in the drop list inside run_full_migration(),
so after any migration the breadcrumb was destroyed and the next
restart would see no version → re-trigger migration
2. Schema version was set after migration, so a crash mid-migration
left no version → infinite re-migration loop
Fix: remove schema_version from the drop list, and set the version
before running migration so crashes don't cause loops.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
UX audit round 2:
- Remove Summary Strip (duplicated Ofsted grade + parent happy/safe/recommend)
- Fold "% would recommend" into Ofsted section header
- Merge SATs Results + Subject Breakdown into one section
- Merge Results Over Time chart + Year-by-Year table into one section
- Add sticky section nav with dynamic pills based on available data
- Unify colour system: replace ad-hoc pill colours with semantic status classes
- Guard Pupils & Inclusion so it only renders with actual data
- Add year to Admissions section title
- Fix progress score 0.0 colour (was neutral gap at ±0.1, now at 0)
- Remove unused .metricTrend CSS class
Page reduced from 16 to 13 sections.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The EES statistics API only exposes ~13 publications; admissions data is not
among them. Switch to the EES content API (content.explore-education-statistics.
service.gov.uk) which covers all publications.
- ees.py: add get_content_release_id() and download_release_zip_csv() that
fetch the release ZIP and extract a named CSV member from it
- admissions.py: use corrected slug (primary-and-secondary-school-applications-
and-offers), correct column names from actual CSV (school_urn,
total_number_places_offered, times_put_as_1st_preference, etc.), derive
first_preference_offers_pct from offer/application ratio, filter to primary
schools only, keep most recent year per URN
Also includes SchoolDetailView UX redesign: parent-first section ordering,
plain-English labels, national average benchmarks, progress score colour
coding, expanded header, quick summary strip, and CSS consolidation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Before dropping tables, save all existing lat/lon coordinates keyed by URN.
After reimport, merge cached coordinates with any newly geocoded ones so
schools that already have coordinates skip the postcodes.io API call.
This makes repeated reimports fast and avoids re-geocoding ~15k schools.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ofsted renamed all columns in the OEIF framework:
- grades are now 'Latest OEIF overall effectiveness' etc.
- dates are 'Inspection start date of latest OEIF graded inspection'
Replace flat COLUMN_MAP with a priority list per field so both current
OEIF and legacy column names work without duplicate-column conflicts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Kestra's HTTP client socket read timeout is shorter than any reasonable
wait for a full geocoded migration. POST /api/admin/reimport-ks2 returns
immediately with {status:started}; the backend runs the job in a thread.
Check GET /api/admin/reimport-ks2/status or watch the UI for schools.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ofsted CSV has a variable number of preamble rows (title, filter warning,
etc.) before the real column headers. Scan up to 10 rows to find the one
containing a URN column rather than assuming a fixed offset.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The geocoding pass over ~15k schools takes longer than any reasonable
HTTP timeout. New approach:
- POST /api/admin/reimport-ks2 starts migration in background thread,
returns {"status":"started"} immediately
- GET /api/admin/reimport-ks2/status returns {running, done}
- ks2.py polls status every 30s (max 2h) before returning
- Kestra flow timeout bumped to PT2H
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add geocode query param to /api/admin/reimport-ks2 (defaults true).
ks2.py passes ?geocode=true so postcodes are resolved to lat/lng in
the same migration pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Kestra requires retry.type to be set (e.g. constant, exponential).
Also rename delay -> interval which is the correct field for constant retry.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Waits up to 120s for /api/v1/flows/search to respond before attempting
imports, giving a clearer error if the URL is wrong or kestra isn't up.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>