Commit Graph

74 Commits

Author SHA1 Message Date
Tudor Sitaru 7e6ded29e2 feat(pipeline): add legacy KS4 backfill (2015/16–2018/19)
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 12s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 52s
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Mirrors the existing legacy KS2 pattern to fill the gap before EES hosted
KS4 data. Four files changed:

- tap-uk-ees: LegacyKS4Stream downloads each year's DfE Compare School
  Performance ZIP, extracts england_ks4final.csv, maps 416 legacy columns
  to Singer fields, strips % suffixes. Registered in discover_streams().
  TapUKEES.config_jsonschema gains legacy_ks4_urls setting.

- stg_legacy_ks4.sql: safe_numeric casts + NULL placeholders for columns
  not present in legacy format (ebacc_avg_score, gcse_grade_91_pct,
  prior_attainment_avg, sen_pct).

- int_ks4_with_lineage.sql: adds all_ks4 CTE unioning stg_ees_ks4 and
  stg_legacy_ks4, matching the int_ks2_with_lineage pattern.

- _stg_sources.yml + meltano.yml: source declaration and setting definition
  for legacy_ks4. URLs configured per-year once provided.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:37:24 +01:00
Tudor Sitaru 3401654ab9 fix(pipeline): restore multi-year KS4 data
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 17s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 46s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m21s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Two bugs prevented historical secondary school data from loading:

1. stg_ees_ks4.sql filtered breakdown_topic = 'Total' only, but EES
   releases prior to 2023/24 use breakdown_topic = 'All pupils' (matching
   the KS2 convention). All older years were silently dropped to zero rows.
   Fix: accept both values with an IN clause.

2. get_all_releases() in tap-uk-ees fetched only the first page of the
   EES releases API. Now follows all pages via the paging.totalPages field
   so no historical release is missed when more than 20 exist.

After re-running the annual EES pipeline, secondary school comparison
charts should show data across all available years (2018/19 onwards).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:18:55 +01:00
Tudor Sitaru 6d685b7e8a refactor(admissions): rename published_admission_number to places_offered
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 18s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 46s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Failing after 13s
Build and Push Docker Images / Trigger Portainer Update (push) Has been skipped
The staging model aliased EES's total_number_places_offered column as
published_admission_number, but PAN is the school's published capacity
(not exposed by EES at school level) — what we actually have is the
count of places offered in a given admissions round. The misnomer
propagated to the mart, SQLAlchemy model, API response, TS types, and
UI copy ("places per year", "(PAN)").

Rename end-to-end and fix the UI labels:
  - "29 places for 42 first-choice applications"
      → "29 places offered for 42 first-choice applications"
  - "Reception/Year 7 places per year"
      → "Reception/Year 7 places offered"
  - drop the misleading "(PAN)" suffix in the secondary view

Also add a comment in stg_ees_admissions clarifying this is the number
of places offered, not PAN. Requires dbt to rebuild fact_admissions
(marts are materialized as tables) before the backend can start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 09:45:43 +01:00
Tudor Sitaru 8ce34b3ecc fix(list): read ofsted grade from fact_ofsted_inspection directly, fix dim_school schema lookup
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 18s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 49s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m9s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
dim_school.sql was checking for int_ofsted_latest in target.schema (wrong schema)
due to the custom generate_schema_name macro using literal schema names. The
model lives in 'intermediate', so ofsted_grade/date/framework were always NULL
in dim_school, causing all list cards to show 'Not yet inspected'.

Fix 1: data_loader.py joins marts.fact_ofsted_inspection with DISTINCT ON to
get latest inspection per school — no pipeline re-run needed.

Fix 2: dim_school.sql uses schema='intermediate' so future dbt runs correctly
denormalise the Ofsted summary into dim_school.
2026-04-13 14:51:14 +01:00
Tudor Sitaru 06bf53ac26 fix(dag): remove invalid --select flag from meltano run
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 13s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 54s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m12s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
meltano run does not support --select; the full tap-uk-ees run already
includes EESKs2NationalStream so no separate task is needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 14:48:31 +01:00
Tudor Sitaru dc66e22d4d feat: ingest official DfE KS2 national averages from EES data catalogue
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 19s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 53s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m24s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Replaces computed means from our school dataset with the published DfE
national headline figures for the KS2 chart reference line.

- tap-uk-ees: new EESKs2NationalStream fetches the stable EES data-catalogue
  CSV (one row per year, England national total, AllSchools filter)
- dbt staging: stg_ees_ks2_national normalises columns, casts to float,
  filters to years >= 201617
- dbt mart: fact_ks2_national_averages — one row per year, official figures
- backend/models: Ks2NationalAverage SQLAlchemy model
- backend/app: /api/national-averages queries the mart for KS2 by_year;
  secondary by_year stays computed (no DfE KS4 national dataset yet)
- DAG: extract_ks2_national task added to school_data_annual_ees,
  runs in parallel with the main EES extract

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 14:40:33 +01:00
Tudor Sitaru 1e5c66d6ab fix(admissions): correct first_preference_offer_pct in dbt staging
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 18s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 49s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m12s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The staging model was mapping EES column ``proportion_1stprefs_v_totaloffers``
straight onto ``first_preference_offer_pct``. That raw column is not a
percentage — it is a ratio of first-preference applications to total offers
(an oversubscription indicator, >1 means oversubscribed), so OLQH rendered
as "1%" when the true first-choice success rate is 27/42 = 64%.

The frontend display code is not at fault and is not patched here —
data-quality issues must be fixed at the source.

- stg_ees_admissions: compute ``first_preference_offer_pct`` as
  ``100 * number_1st_preference_offers / times_put_as_1st_preference`` —
  of families who listed this school first, the % that received an offer
  (0–100). Guard against divide-by-zero.
- stg_ees_admissions: expose the legitimate EES ratio as the new column
  ``oversubscription_ratio`` (1st-preference applications per place) for
  future use, clearly named.
- fact_admissions, FactAdmissions model, data_loader: propagate the new
  ``oversubscription_ratio`` column.
- SchoolAdmissions type: document both columns inline.
- buildSchoolSummary: reword the oversubscription clause so it reads
  sensibly across the whole 0–100 range (no more "just 64%").
- Hero chip subtitle: clearer phrasing "X% of first-choice applicants
  offered a place".

Requires a dbt run of stg_ees_admissions and fact_admissions on deploy
so the new column materialises.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 11:29:40 +01:00
Tudor Sitaru f053b35c6f test(dim_school): downgrade phase not_null to warn
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 13s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 45s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m16s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The new phase inference can legitimately leave ~1100 independent schools with
null phase (no GIAS phase, no statutory ages, name gives no hint). That's a
known data quality gap, not a pipeline failure — the UI already handles null
by showing no pill. Downgrade the test to warn so it stays visible in dbt
output without blocking the DAG.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 22:12:57 +01:00
Tudor Sitaru ca5f6a962c fix(dim_school): expand phase inference with name-based fallback
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 12s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 45s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m10s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The case-insensitive "Not Applicable" fix caught schools where GIAS publishes
statutory ages, but some independent schools leave those blank too — they fall
through every branch and end up with null phase and no pill in the UI.

Add a third tier that infers phase from the school name
(Primary/Infant/Junior/Prep vs Secondary/High/Grammar/Senior/Upper) and also
normalise "Not Applicable" handling with trim() + "unknown"/"" exclusion, so
the final else branch can safely return null instead of the catch-all string.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 21:15:54 +01:00
Tudor Sitaru a562f408d2 refactor: expand RWM to "Reading, Writing & Maths" in user-facing text
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 24s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 52s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m51s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Expand the abbreviation in metric names (backend schemas), the home page
sort dropdown, README/QA docs, and pipeline comments. Short_name fields
and the compact row/map-card labels remain abbreviated for space.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 15:53:52 +01:00
Tudor Sitaru 5b025b98bd fix(dim_school): use case-insensitive comparison for phase inference
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 13s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 50s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m6s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
GIAS provides 'Not Applicable' (capital A) but the check used 'Not applicable',
so the case-sensitive != matched true and skipped the age-range inference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 15:33:04 +01:00
Tudor Sitaru 4c3c3c882d fix(dim_school): infer phase from age range for independent schools
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 12s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 50s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m9s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Independent schools have phase='Not applicable' in GIAS. Now infer
phase from statutory age range: <=11 → Primary, >=11 → Secondary,
spans both → All-through. Falls back to original value if no age data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 16:18:52 +01:00
Tudor Sitaru 2b757e556d fix(legacy-ks2): strip % suffix from percentage values
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m11s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m37s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Old DfE CSVs encode percentages as "57%" not "57". The safe_numeric
macro rejects non-numeric strings, so strip the suffix before emitting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 13:07:51 +01:00
Tudor Sitaru fbd1de9220 fix(dag): add stg_legacy_ks2 to annual EES dbt build selector
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m11s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m29s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 11:27:29 +01:00
Tudor Sitaru fba8e74b72 refactor(legacy-ks2): use explicit year→URL mapping instead of base URL pattern
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
The file hosting uses non-deterministic URLs, so replace legacy_ks2_base_url
+ legacy_ks2_years with a single legacy_ks2_urls object mapping year codes
to download URLs. Configure the 4 pre-COVID years in meltano.yml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 22:44:11 +01:00
Tudor Sitaru 6d4962639c feat(legacy-ks2): add stream for pre-COVID KS2 data (2015-2019)
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 46s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m17s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 2m26s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Add LegacyKS2Stream to tap-uk-ees: downloads old DfE england_ks2final.csv
  files from a configurable base URL, maps 318-column wide format to the
  same schema as stg_ees_ks2 output
- Add stg_legacy_ks2.sql staging model with safe_numeric casts
- Add legacy_ks2 source to _stg_sources.yml
- Update int_ks2_with_lineage.sql to union EES + legacy data
- Configurable via legacy_ks2_base_url and legacy_ks2_years tap settings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 14:36:41 +01:00
Tudor Sitaru fc011c6547 fix(tap-uk-ees): case-insensitive URN column matching for older census files
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m48s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Older census CSVs use 'URN' (uppercase) while the stream expects 'urn'.
Normalise the column name before filtering and emitting records.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 22:36:16 +01:00
Tudor Sitaru 752abd69a5 fix(tap-uk-ees): inject time_period from release slug when absent in CSV
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m37s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Older census (and other) files don't include a time_period column.
Derive it from the release slug (e.g. '2022-23' → '202223') and inject
it into records so the required Singer schema field is always present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 21:59:24 +01:00
Tudor Sitaru 570c2b689e fix(tap-uk-ees): handle plain list response from releases endpoint
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m6s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 21:47:14 +01:00
Tudor Sitaru 9a1572ea20 feat(tap-uk-ees): fetch all historical releases, not just latest
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m42s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Add get_all_release_ids() to paginate /publications/{slug}/releases and
iterate over every release in get_records(). Add latest_only config flag
(default false) to restore single-release behaviour for daily runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 21:37:26 +01:00
tudor 6e5249aa1e refactor(phase): merge KS2+KS4 into fact_performance, fix all phase inconsistencies
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 50s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m24s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Root cause: the UNION ALL query in data_loader.py produced two rows per
all-through school per year (one KS2, one KS4), with drop_duplicates()
silently discarding the KS4 row. Fixes:

- New dbt mart `fact_performance`: FULL OUTER JOIN of fact_ks2_performance
  and fact_ks4_performance on (urn, year). One row per school per year.
  All-through schools have both KS2 and KS4 columns populated.
- data_loader.py: replace 175-line UNION ALL with a simple JOIN to
  fact_performance. No more duplicate rows or drop_duplicates needed.
- sync_typesense.py: single LATERAL JOIN to fact_performance instead of
  two separate KS2/KS4 joins.
- app.py: remove drop_duplicates (no longer needed); add PHASE_GROUPS
  constant so all-through/middle schools appear in primary and secondary
  filter results (were previously invisible to both); scope result_filters
  gender/admissions_policies to secondary schools only.
- HomeView.tsx: isSecondaryView is now majority-based (not "any secondary")
  and isMixedView shows both sort option sets for mixed result sets.
- school/[slug]/page.tsx: all-through schools route to SchoolDetailView
  (renders both SATs + GCSE sections) instead of SecondarySchoolDetailView
  (KS4-only). Dedicated SEO metadata for all-through schools.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 14:07:30 +01:00
tudor b7bff7bf6b feat(seo): static sitemap generation job via Airflow
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 45s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m29s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Backend builds sitemap.xml from school data at startup (in-memory)
- POST /api/admin/regenerate-sitemap refreshes it after data updates
- New Airflow DAG (sitemap_generate) runs Sundays 05:00 and calls the endpoint
- Next.js proxies /sitemap.xml to the backend; removes the slow dynamic sitemap.ts
- docker-compose passes BACKEND_URL + ADMIN_API_KEY to Airflow env

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 15:15:41 +01:00
tudor f3a8ebdb4b fix(dbt): deduplicate int_ks4_with_lineage predecessor rows
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
When multiple predecessor URNs exist for the same current school and
year, use DISTINCT ON to keep the one with the most pupils — matching
the same logic already in int_ks2_with_lineage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 18:58:50 +00:00
tudor f0c76a1724 fix(dbt): fix stg_ees_ks4 breakdown filter: 'Total' not 'All pupils'
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 31s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m26s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
The EES KS4 performance CSV uses breakdown_topic='Total' for the
all-pupils aggregate, not 'All pupils' as the model assumed. This
caused 0 rows to pass the filter despite 40k rows in raw.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 18:35:00 +00:00
tudor 3e787b395f chore(pipeline): add EES KS4 tap diagnostic script
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 2m28s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m11s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m28s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 18:26:15 +00:00
tudor 250d1f7c77 fix(tap-uk-idaci): add openpyxl dependency for Excel file parsing
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 49s
Build and Push Docker Images / Build Frontend (Next.js) (push) Failing after 1m2s
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 15:00:00 +00:00
tudor 1629a8f994 feat(pipeline): add DAGs for Parent View and IDACI deprivation
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
- school_data_monthly_parent_view: runs 1st of month, extracts Ofsted
  Parent View and builds fact_parent_view
- school_data_annual_idaci: manual trigger, extracts IDACI deprivation
  index and builds fact_deprivation

Both tables were missing, causing safe_query to fail and leave the
PostgreSQL transaction in an aborted state, silently killing all
subsequent supplementary data queries including fact_admissions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:08:12 +00:00
tudor 7724fe3503 fix(stg_ofsted_inspections): correctly filter NULL string inspection dates
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The string 'NULL' is not SQL NULL, so the WHERE in the renamed CTE
passed those rows through. Filter on the raw value using nullif in the
CTE and on the computed date in the outer SELECT.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:21:30 +00:00
tudor 1d56eebe87 fix(stg_ofsted_inspections): filter out rows with no inspection date
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m24s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Schools in the MI file that have never been inspected have a null
inspection_date after parsing. Exclude them — they are not inspection
records.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 17:55:11 +00:00
tudor 10720400fd fix(stg_ofsted_inspections): parse DD/MM/YYYY date format from Ofsted CSV
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m3s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m28s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 17:34:34 +00:00
tudor 05cb22f1a5 fix(stg_ofsted_inspections): handle NULL strings from Ofsted CSV
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m26s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Use nullif+trim for date cast and safe_numeric for integer grades to
handle literal 'NULL' strings present in the new Report Card format CSV.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 17:23:46 +00:00
tudor 26aa3c2d70 fix(tap-uk-ofsted): fix header row detection matching 'urn' inside 'turn'
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The preamble row in Ofsted CSVs contains 'turn off all filters' which
matched 'urn' in line.lower(), so header_idx was set to 0 instead of
the real header row. Use a regex that matches URN only as a CSV field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 17:05:03 +00:00
tudor e56a63c59c debug(tap-uk-ofsted): log CSV column names to diagnose 0-record extraction
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 31s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 15:47:32 +00:00
tudor 668e234eb2 feat(census): add demographic columns to EES census tap and staging models
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
tap-uk-ees: EESCensusStream now declares 27 data columns (FSM %, EAL %,
ethnicity breakdowns, pupil counts) with clean Singer field names mapped
from the verbose CSV column names (e.g. '% of pupils known to be eligible
for free school meals' → fsm_pct) via a new _column_renames mechanism on
the base stream class.

stg_ees_census: materialised as table, applies safe_numeric to all
percentage/count columns, filters to numeric URNs.

int_pupil_chars_merged + fact_pupil_characteristics: pass all columns
through from staging (previously stubs with only 3 columns).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 14:07:48 +00:00
tudor 4b02ab3d8a feat: wire Typesense search into backend, fix sync performance data bug
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 1m1s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
sync_typesense.py:
- Fix query string replacement: was matching 'ST_X(l.geom) as lng' but
  QUERY_BASE uses 'l.longitude as lng' — KS2/KS4 lateral joins were
  silently dropped on every sync run

backend:
- Add typesense_url/typesense_api_key settings to config.py
- Add search_schools_typesense() to data_loader.py — queries Typesense
  'schools' alias, returns URNs in relevance order with typo tolerance;
  falls back to empty list if Typesense is unavailable
- /api/schools: replace pandas str.contains with Typesense search;
  results are filtered from the DataFrame and returned in relevance order;
  graceful fallback to substring match if Typesense is down

requirements.txt: add typesense==0.21.0, numpy==1.26.4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 13:23:32 +00:00
tudor 5d8b319451 fix(dbt): stub rc_* columns as NULL in stg_ofsted_inspections
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m23s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
tap-uk-ofsted schema only declares OEIF columns; rc_* (Report Card)
columns were never emitted so they don't exist in raw.ofsted_inspections.
Replace column references with NULL::text until the actual CSV column
names for the post-Nov 2025 Report Card framework are confirmed and
added to the tap schema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 12:50:58 +00:00
tudor 77f75fb6e5 fix(dbt): deduplicate predecessor KS2 rows and downgrade orphan test to warn
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m11s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m31s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- int_ks2_with_lineage: use DISTINCT ON (current_urn, year) in predecessor_ks2
  to handle schools with multiple predecessors that both have KS2 data for the
  same year (e.g. two schools that merged). Keeps the predecessor with most pupils.
- dbt_project.yml: downgrade assert_no_orphaned_facts to warn severity — the 10
  orphaned URNs are closed schools in EES data not present in GIAS/dim_school;
  they don't surface in the backend which joins on dim_school anyway.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 12:16:36 +00:00
tudor b41e6c250e fix(dbt): filter non-numeric URNs and trim whitespace in EES staging models
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m30s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Filter school_urn/time_period to '^[0-9]+$' to exclude "n/a" and other
  non-numeric values that caused integer cast failures in fact_admissions
- Add trim() to all school_urn/time_period casts to prevent whitespace
  variants producing duplicate urn+year rows in fact_ks2_performance

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 12:00:30 +00:00
tudor 6e720feca4 perf(dbt): collapse stg_ees_ks2 to single-pass pivot
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m31s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Previous version scanned ees_ks2_attainment (1.2M rows) 5 times via
separate CTEs (all_pupils, gender_boys, gender_girls, disadv, not_disadv)
plus 5 LEFT JOINs. Rewritten as one GROUP BY with conditional aggregation
— single scan, no self-joins.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 11:42:40 +00:00
tudor ae9fd26eba perf(dbt): materialize stg_ees_ks2 and stg_ees_ks4 as tables
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m30s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
KS2 attainment has 1.2M rows in long format. As a view, the pivot was
re-executed inline for every downstream model (intermediate → fact),
causing fact_ks2_performance CREATE TABLE to run for 18+ minutes.

Materializing as tables means the pivot runs once during staging, and
downstream models read from a pre-computed ~16k-row result.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 11:20:20 +00:00
tudor 33b395d2bd fix(dbt): apply safe_numeric macro to fix EES suppression code 'c' errors
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m14s
Build and Push Docker Images / Build Integrator (push) Successful in 58s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Replace nullif(col, 'z') casts with safe_numeric macro across KS2, KS4,
and admissions staging models. The regex-based macro treats any non-numeric
string (z, c, x, q, u, etc.) as NULL without needing an explicit list.

Also fix FSM_eligible_percent column quoting in stg_ees_admissions — target-
postgres stores mixed-case column names quoted, so unquoted references were
being folded to fsm_eligible_percent by PostgreSQL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 10:41:27 +00:00
tudor 8e8d1bd8c5 fix(ees-tap): filter out rows with null URN before emitting
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m10s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m47s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The admissions school-level file contains some rows with null school_urn
(LA/category aggregates that survive the geographic_level filter). These
cause a not-null constraint violation at target-postgres. Drop any row
where the URN column is null or empty before yielding records.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 10:13:17 +00:00
tudor c7357336e3 fix(ees-tap): fix BOM handling for admissions CSV
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m6s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Admissions file is UTF-8 with BOM, not Latin-1. Reading as latin-1
decoded the BOM bytes as '' which wasn't stripped. Change admissions
encoding to utf-8-sig (strips BOM automatically). Also update the manual
BOM strip fallback to handle the latin-1 decoded form.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 10:03:17 +00:00
tudor b8ecc5c58b fix(ees-tap): strip UTF-8 BOM from CSV column names
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m42s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Some DfE supporting-files CSVs have a UTF-8 BOM on the first column,
causing it to be named '\ufefftime_period' instead of 'time_period'.
This trips Singer schema validation ('time_period' is a required property).
Strip the BOM from all column names after read_csv.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:54:15 +00:00
tudor f4f0257447 fix(ees-tap): add latin-1 encoding for census/admissions, default utf-8 for others
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 52s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
DfE supporting-files CSVs (spc_school_level_underlying_data, AppsandOffers
SchoolLevel) are Latin-1 encoded. Add _encoding class attribute to base
stream class and override to 'latin-1' for census and admissions streams.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:41:40 +00:00
tudor ca351e9d73 feat: migrate backend to marts schema, update EES tap for verified datasets
Pipeline:
- EES tap: split KS4 into performance + info streams, fix admissions filename
  (SchoolLevel keyword match), fix census filename (yearly suffix), remove
  phonics (no school-level data on EES), change endswith → in for matching
- stg_ees_ks4: rewrite to filter long-format data and extract Attainment 8,
  Progress 8, EBacc, English/Maths metrics; join KS4 info for context
- stg_ees_admissions: map real CSV columns (total_number_places_offered, etc.)
- stg_ees_census: update source reference, stub with TODO for data columns
- Remove stg_ees_phonics, fact_phonics (no school-level EES data)
- Add ees_ks4_performance + ees_ks4_info sources, remove ees_ks4 + ees_phonics
- Update int_ks4_with_lineage + fact_ks4_performance with new KS4 columns
- Annual EES DAG: remove stg_ees_phonics+ from selector

Backend:
- models.py: replace all models to point at marts.* tables with schema='marts'
  (DimSchool, DimLocation, KS2Performance, FactOfstedInspection, etc.)
- data_loader.py: rewrite load_school_data_as_dataframe() using raw SQL joining
  dim_school + dim_location + fact_ks2_performance; update get_supplementary_data()
- database.py: remove migration machinery, keep only connection setup
- app.py: remove check_and_migrate_if_needed, remove /api/admin/reimport-ks2
  endpoints (pipeline handles all imports)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:29:27 +00:00
tudor d82e36e7b2 feat(ees): rewrite EES tap and KS2 models for actual data structure
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 31s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Fix publication slugs (KS4, Phonics, Admissions were wrong)
- Split KS2 into two streams: ees_ks2_attainment (long format) and
  ees_ks2_info (wide format context data)
- Target specific filenames instead of keyword matching
- Handle school_urn vs urn column naming
- Pivot KS2 attainment from long to wide format in dbt staging
- Add all ~40 KS2 columns the backend needs (GPS, absence, gender,
  disadvantaged breakdowns, context demographics)
- Pass through all columns in int_ks2_with_lineage and fact_ks2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 23:08:50 +00:00
tudor 719f06e480 fix(pipeline): make total_pupils non-optional for Typesense, add lat/lng to dim_location
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m3s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m29s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Remove optional flag from total_pupils (Typesense requires default
  sorting field to be non-optional)
- Add latitude/longitude columns to dim_location computed from PostGIS
  geom, for direct use by backend and Typesense sync

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:45:02 +00:00
tudor 5e44d88d23 fix(sync): use numeric default_sorting_field, dynamic KS2/KS4 joins, populate geopoints
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m28s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Typesense requires numeric default_sorting_field — use total_pupils
- Dynamically include KS2/KS4 joins only if those tables exist
- Extract lat/lng from PostGIS geom and populate Typesense geopoint field

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:16:21 +00:00
tudor 72cbbf7778 fix(dbt): simplify search_path to just public for PostGIS
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m30s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:47:01 +00:00