Commit Graph

60 Commits

Author SHA1 Message Date
1c77a6c593 fix(pipeline): run meltano install in Dockerfile to generate catalogs
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m13s
Build and Push Docker Images / Build Integrator (push) Successful in 58s
Build and Push Docker Images / Build Kestra Init (push) Successful in 33s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m31s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Meltano elt requires catalog files (tap.properties.json) to exist.
These are generated by `meltano install` which discovers tap schemas
and installs the target-postgres loader. Without this step, `meltano
elt` fails with "catalog file is missing".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 12:28:59 +00:00
cd75fc4c24 fix(taps): align with integrator resilience patterns
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m7s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Port critical patterns from the working integrator into Singer taps:
- GIAS: add 404 fallback to yesterday's date, increase timeout to 300s,
  use latin-1 encoding, use dated URL for links (static URL returns 500)
- FBIT: add GIAS date fallback, increase timeout, fix encoding to latin-1
- IDACI: use dated GIAS URL with fallback instead of undated static URL,
  fix encoding to latin-1, increase timeout to 300s
- Ofsted: try utf-8-sig then fall back to latin-1 encoding

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 11:13:38 +00:00
b6a487776b fix(airflow): set DAGS_FOLDER in image env and reserialize on init
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Add AIRFLOW__CORE__DAGS_FOLDER env var in Dockerfile so it's always set
- Run `airflow dags reserialize` after `db migrate` in init container so
  DAGs appear immediately without waiting for scheduler scan interval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 11:05:41 +00:00
e815f597ab fix(dags): use global bin paths and add BashOperator import fallback
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 49s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- MELTANO_BIN/DBT_BIN pointed to .venv/bin/ but Dockerfile installs globally
- Add try/except for BashOperator import to handle both Airflow 3 provider
  path and legacy path, preventing silent DAG import failures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 10:47:18 +00:00
97d975114a feat(pipeline): implement parent-view, fbit, idaci Singer taps + align staging/mart models
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m6s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Port extraction logic from integrator scripts into Singer SDK taps:
- tap-uk-parent-view: scrapes Ofsted open data portal, parses survey responses (14 questions)
- tap-uk-fbit: queries FBIT API per-URN with rate limiting, computes per-pupil spend
- tap-uk-idaci: downloads IoD2019 XLSX, batch-resolves postcodes→LSOAs via postcodes.io

Update dbt models to match actual tap output schemas:
- stg_idaci now includes URN (tap does the postcode→LSOA→school join)
- stg_parent_view expanded from 8 to 13 question columns
- fact_deprivation simplified (no longer needs postcode→LSOA join in dbt)
- fact_parent_view expanded to include all 13 question metrics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 10:38:07 +00:00
914de17d15 fix(pipeline): install curl in pipeline image for healthchecks
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m46s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 09:52:06 +00:00
a7904b627d fix(pipeline): migrate to Airflow 3 API server and SimpleAuthManager
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Integrator (push) Successful in 58s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 31s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Airflow 3 replaced `airflow webserver` with `airflow api-server` and
removed the `airflow users` CLI. Auth is now via SimpleAuthManager
configured through AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_USERS env var.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 09:32:08 +00:00
deb4024731 chore(pipeline): bump all dependencies to latest stable versions
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Airflow 2.11 → 3.1 (BashOperator moved to providers-standard)
- Meltano 3.5 → 4.1 (meltano.yml version 2, meltanolabs target-postgres)
- dbt-postgres 1.9 → 1.10
- singer-sdk 0.39 → 0.53 (all 6 taps)
- Typesense Docker 27.1 → 30.1
- Typesense Python client >=2.0
- Python base image 3.12 → 3.13

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 09:18:11 +00:00
e32666ae4c fix(pipeline): bump Airflow to 2.11 and dbt to 1.9 to resolve SQLAlchemy conflict
Some checks failed
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Failing after 49s
Build and Push Docker Images / Trigger Portainer Update (push) Has been skipped
Airflow 2.10 requires SQLAlchemy <2.0, but dbt-postgres 1.8+ pulls in
SQLAlchemy 2.x. Airflow 2.11 supports SQLAlchemy 2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 09:08:21 +00:00
8f02b5125e feat(pipeline): add Meltano + dbt + Airflow ELT pipeline scaffold
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Replaces the hand-rolled integrator with a production-grade ELT pipeline
using Meltano (Singer taps), dbt Core (medallion architecture), and
Apache Airflow (orchestration). Adds Typesense for search and PostGIS
for geospatial queries.

- 6 custom Singer taps (GIAS, EES, Ofsted, Parent View, FBIT, IDACI)
- dbt project: 12 staging, 5 intermediate, 12 mart models
- 3 Airflow DAGs (daily/monthly/annual schedules)
- Typesense sync + batch geocoding scripts
- docker-compose: add Airflow, Typesense; upgrade to PostGIS
- Portainer stack definition matching live deployment topology

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 08:37:53 +00:00