Commit Graph

3 Commits

Author SHA1 Message Date
Tudor Sitaru 1e5c66d6ab fix(admissions): correct first_preference_offer_pct in dbt staging
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 18s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 49s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m12s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The staging model was mapping EES column ``proportion_1stprefs_v_totaloffers``
straight onto ``first_preference_offer_pct``. That raw column is not a
percentage — it is a ratio of first-preference applications to total offers
(an oversubscription indicator, >1 means oversubscribed), so OLQH rendered
as "1%" when the true first-choice success rate is 27/42 = 64%.

The frontend display code is not at fault and is not patched here —
data-quality issues must be fixed at the source.

- stg_ees_admissions: compute ``first_preference_offer_pct`` as
  ``100 * number_1st_preference_offers / times_put_as_1st_preference`` —
  of families who listed this school first, the % that received an offer
  (0–100). Guard against divide-by-zero.
- stg_ees_admissions: expose the legitimate EES ratio as the new column
  ``oversubscription_ratio`` (1st-preference applications per place) for
  future use, clearly named.
- fact_admissions, FactAdmissions model, data_loader: propagate the new
  ``oversubscription_ratio`` column.
- SchoolAdmissions type: document both columns inline.
- buildSchoolSummary: reword the oversubscription clause so it reads
  sensibly across the whole 0–100 range (no more "just 64%").
- Hero chip subtitle: clearer phrasing "X% of first-choice applicants
  offered a place".

Requires a dbt run of stg_ees_admissions and fact_admissions on deploy
so the new column materialises.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 11:29:40 +01:00
tudor ca351e9d73 feat: migrate backend to marts schema, update EES tap for verified datasets
Pipeline:
- EES tap: split KS4 into performance + info streams, fix admissions filename
  (SchoolLevel keyword match), fix census filename (yearly suffix), remove
  phonics (no school-level data on EES), change endswith → in for matching
- stg_ees_ks4: rewrite to filter long-format data and extract Attainment 8,
  Progress 8, EBacc, English/Maths metrics; join KS4 info for context
- stg_ees_admissions: map real CSV columns (total_number_places_offered, etc.)
- stg_ees_census: update source reference, stub with TODO for data columns
- Remove stg_ees_phonics, fact_phonics (no school-level EES data)
- Add ees_ks4_performance + ees_ks4_info sources, remove ees_ks4 + ees_phonics
- Update int_ks4_with_lineage + fact_ks4_performance with new KS4 columns
- Annual EES DAG: remove stg_ees_phonics+ from selector

Backend:
- models.py: replace all models to point at marts.* tables with schema='marts'
  (DimSchool, DimLocation, KS2Performance, FactOfstedInspection, etc.)
- data_loader.py: rewrite load_school_data_as_dataframe() using raw SQL joining
  dim_school + dim_location + fact_ks2_performance; update get_supplementary_data()
- database.py: remove migration machinery, keep only connection setup
- app.py: remove check_and_migrate_if_needed, remove /api/admin/reimport-ks2
  endpoints (pipeline handles all imports)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:29:27 +00:00
tudor 8f02b5125e feat(pipeline): add Meltano + dbt + Airflow ELT pipeline scaffold
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Replaces the hand-rolled integrator with a production-grade ELT pipeline
using Meltano (Singer taps), dbt Core (medallion architecture), and
Apache Airflow (orchestration). Adds Typesense for search and PostGIS
for geospatial queries.

- 6 custom Singer taps (GIAS, EES, Ofsted, Parent View, FBIT, IDACI)
- dbt project: 12 staging, 5 intermediate, 12 mart models
- 3 Airflow DAGs (daily/monthly/annual schedules)
- Typesense sync + batch geocoding scripts
- docker-compose: add Airflow, Typesense; upgrade to PostGIS
- Portainer stack definition matching live deployment topology

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 08:37:53 +00:00