Tudor Sitaru
785cb72063
config(pipeline): add legacy_ks4_urls for 2015/16–2018/19
...
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 20s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Frontend (Next.js) (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-16 10:39:31 +01:00
Tudor Sitaru
7e6ded29e2
feat(pipeline): add legacy KS4 backfill (2015/16–2018/19)
...
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 12s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 52s
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Mirrors the existing legacy KS2 pattern to fill the gap before EES hosted
KS4 data. Four files changed:
- tap-uk-ees: LegacyKS4Stream downloads each year's DfE Compare School
Performance ZIP, extracts england_ks4final.csv, maps 416 legacy columns
to Singer fields, strips % suffixes. Registered in discover_streams().
TapUKEES.config_jsonschema gains legacy_ks4_urls setting.
- stg_legacy_ks4.sql: safe_numeric casts + NULL placeholders for columns
not present in legacy format (ebacc_avg_score, gcse_grade_91_pct,
prior_attainment_avg, sen_pct).
- int_ks4_with_lineage.sql: adds all_ks4 CTE unioning stg_ees_ks4 and
stg_legacy_ks4, matching the int_ks2_with_lineage pattern.
- _stg_sources.yml + meltano.yml: source declaration and setting definition
for legacy_ks4. URLs configured per-year once provided.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-16 10:37:24 +01:00
Tudor Sitaru
fba8e74b72
refactor(legacy-ks2): use explicit year→URL mapping instead of base URL pattern
...
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
The file hosting uses non-deterministic URLs, so replace legacy_ks2_base_url
+ legacy_ks2_years with a single legacy_ks2_urls object mapping year codes
to download URLs. Configure the 4 pre-COVID years in meltano.yml.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:44:11 +01:00
tudor
84261f6125
fix(meltano): set default_environment, remove deprecated version field
...
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Kestra Init (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Integrator (push) Has been cancelled
Meltano 4.x requires an environment to be specified. Set production as
the default. Also remove the deprecated 'version: 2' field.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-26 14:01:31 +00:00
tudor
9eae6bffae
fix(meltano): use 'database' not 'dbname' for meltanolabs target-postgres
...
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m6s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m35s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
The meltanolabs target-postgres variant expects 'database' as the
config key, not 'dbname' (which was the pipelinewise variant's key).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-26 13:53:49 +00:00
tudor
c576bba06a
fix(meltano): remove catalog capability and switch elt to run
...
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 34s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m26s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
The `catalog` capability forced Meltano to run --discover and generate
a catalog file (tap.properties.json) before each extraction. This fails
because our Singer SDK taps emit schemas inline and don't need external
catalog files. Removing the capability makes Meltano invoke taps
directly without catalog generation.
Also switch from deprecated `meltano elt` to `meltano run` for
Meltano 4.x compatibility.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-26 13:45:23 +00:00
tudor
deb4024731
chore(pipeline): bump all dependencies to latest stable versions
...
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m4s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m45s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
- Airflow 2.11 → 3.1 (BashOperator moved to providers-standard)
- Meltano 3.5 → 4.1 (meltano.yml version 2, meltanolabs target-postgres)
- dbt-postgres 1.9 → 1.10
- singer-sdk 0.39 → 0.53 (all 6 taps)
- Typesense Docker 27.1 → 30.1
- Typesense Python client >=2.0
- Python base image 3.12 → 3.13
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-26 09:18:11 +00:00
tudor
e32666ae4c
fix(pipeline): bump Airflow to 2.11 and dbt to 1.9 to resolve SQLAlchemy conflict
...
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Failing after 49s
Build and Push Docker Images / Trigger Portainer Update (push) Has been skipped
Airflow 2.10 requires SQLAlchemy <2.0, but dbt-postgres 1.8+ pulls in
SQLAlchemy 2.x. Airflow 2.11 supports SQLAlchemy 2.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-26 09:08:21 +00:00
tudor
8f02b5125e
feat(pipeline): add Meltano + dbt + Airflow ELT pipeline scaffold
...
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Replaces the hand-rolled integrator with a production-grade ELT pipeline
using Meltano (Singer taps), dbt Core (medallion architecture), and
Apache Airflow (orchestration). Adds Typesense for search and PostGIS
for geospatial queries.
- 6 custom Singer taps (GIAS, EES, Ofsted, Parent View, FBIT, IDACI)
- dbt project: 12 staging, 5 intermediate, 12 mart models
- 3 Airflow DAGs (daily/monthly/annual schedules)
- Typesense sync + batch geocoding scripts
- docker-compose: add Airflow, Typesense; upgrade to PostGIS
- Portainer stack definition matching live deployment topology
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-26 08:37:53 +00:00