Commit Graph

10 Commits

Author SHA1 Message Date
Tudor Sitaru ae33bfe04b refactor(pipeline): unify KS2 and KS4 legacy sources to same annual ZIPs
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 13s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 47s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m18s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
LegacyKS2Stream now auto-detects ZIP vs bare CSV — if the download is a ZIP
it extracts england_ks2final.csv; if it's a plain CSV file it reads directly.
This keeps backwards compatibility while allowing both streams to share the
same DfE annual archive URLs.

legacy_ks2_urls updated to point at the same 4 ZIPs as legacy_ks4_urls so
only one set of archives needs to be maintained going forward.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:41:01 +01:00
Tudor Sitaru 785cb72063 config(pipeline): add legacy_ks4_urls for 2015/16–2018/19
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 20s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Frontend (Next.js) (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:39:31 +01:00
Tudor Sitaru 7e6ded29e2 feat(pipeline): add legacy KS4 backfill (2015/16–2018/19)
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 12s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 52s
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Mirrors the existing legacy KS2 pattern to fill the gap before EES hosted
KS4 data. Four files changed:

- tap-uk-ees: LegacyKS4Stream downloads each year's DfE Compare School
  Performance ZIP, extracts england_ks4final.csv, maps 416 legacy columns
  to Singer fields, strips % suffixes. Registered in discover_streams().
  TapUKEES.config_jsonschema gains legacy_ks4_urls setting.

- stg_legacy_ks4.sql: safe_numeric casts + NULL placeholders for columns
  not present in legacy format (ebacc_avg_score, gcse_grade_91_pct,
  prior_attainment_avg, sen_pct).

- int_ks4_with_lineage.sql: adds all_ks4 CTE unioning stg_ees_ks4 and
  stg_legacy_ks4, matching the int_ks2_with_lineage pattern.

- _stg_sources.yml + meltano.yml: source declaration and setting definition
  for legacy_ks4. URLs configured per-year once provided.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:37:24 +01:00
Tudor Sitaru fba8e74b72 refactor(legacy-ks2): use explicit year→URL mapping instead of base URL pattern
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
The file hosting uses non-deterministic URLs, so replace legacy_ks2_base_url
+ legacy_ks2_years with a single legacy_ks2_urls object mapping year codes
to download URLs. Configure the 4 pre-COVID years in meltano.yml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 22:44:11 +01:00
tudor 84261f6125 fix(meltano): set default_environment, remove deprecated version field
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Kestra Init (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Integrator (push) Has been cancelled
Meltano 4.x requires an environment to be specified. Set production as
the default. Also remove the deprecated 'version: 2' field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 14:01:31 +00:00
tudor 9eae6bffae fix(meltano): use 'database' not 'dbname' for meltanolabs target-postgres
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m6s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m35s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
The meltanolabs target-postgres variant expects 'database' as the
config key, not 'dbname' (which was the pipelinewise variant's key).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 13:53:49 +00:00
tudor c576bba06a fix(meltano): remove catalog capability and switch elt to run
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m26s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
The `catalog` capability forced Meltano to run --discover and generate
a catalog file (tap.properties.json) before each extraction. This fails
because our Singer SDK taps emit schemas inline and don't need external
catalog files. Removing the capability makes Meltano invoke taps
directly without catalog generation.

Also switch from deprecated `meltano elt` to `meltano run` for
Meltano 4.x compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 13:45:23 +00:00
tudor deb4024731 chore(pipeline): bump all dependencies to latest stable versions
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Airflow 2.11 → 3.1 (BashOperator moved to providers-standard)
- Meltano 3.5 → 4.1 (meltano.yml version 2, meltanolabs target-postgres)
- dbt-postgres 1.9 → 1.10
- singer-sdk 0.39 → 0.53 (all 6 taps)
- Typesense Docker 27.1 → 30.1
- Typesense Python client >=2.0
- Python base image 3.12 → 3.13

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 09:18:11 +00:00
tudor e32666ae4c fix(pipeline): bump Airflow to 2.11 and dbt to 1.9 to resolve SQLAlchemy conflict
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Failing after 49s
Build and Push Docker Images / Trigger Portainer Update (push) Has been skipped
Airflow 2.10 requires SQLAlchemy <2.0, but dbt-postgres 1.8+ pulls in
SQLAlchemy 2.x. Airflow 2.11 supports SQLAlchemy 2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 09:08:21 +00:00
tudor 8f02b5125e feat(pipeline): add Meltano + dbt + Airflow ELT pipeline scaffold
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Replaces the hand-rolled integrator with a production-grade ELT pipeline
using Meltano (Singer taps), dbt Core (medallion architecture), and
Apache Airflow (orchestration). Adds Typesense for search and PostGIS
for geospatial queries.

- 6 custom Singer taps (GIAS, EES, Ofsted, Parent View, FBIT, IDACI)
- dbt project: 12 staging, 5 intermediate, 12 mart models
- 3 Airflow DAGs (daily/monthly/annual schedules)
- Typesense sync + batch geocoding scripts
- docker-compose: add Airflow, Typesense; upgrade to PostGIS
- Portainer stack definition matching live deployment topology

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 08:37:53 +00:00