feat(pipeline): add Meltano + dbt + Airflow ELT pipeline scaffold
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m9s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s

Replaces the hand-rolled integrator with a production-grade ELT pipeline
using Meltano (Singer taps), dbt Core (medallion architecture), and
Apache Airflow (orchestration). Adds Typesense for search and PostGIS
for geospatial queries.

- 6 custom Singer taps (GIAS, EES, Ofsted, Parent View, FBIT, IDACI)
- dbt project: 12 staging, 5 intermediate, 12 mart models
- 3 Airflow DAGs (daily/monthly/annual schedules)
- Typesense sync + batch geocoding scripts
- docker-compose: add Airflow, Typesense; upgrade to PostGIS
- Portainer stack definition matching live deployment topology

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-26 08:37:53 +00:00
parent 8aca0a7a53
commit 8f02b5125e
65 changed files with 2822 additions and 72 deletions

114
pipeline/meltano.yml Normal file
View File

@@ -0,0 +1,114 @@
version: 1
project_id: school-compare-pipeline
plugins:
extractors:
- name: tap-uk-gias
namespace: uk_gias
pip_url: ./plugins/extractors/tap-uk-gias
executable: tap-uk-gias
capabilities:
- catalog
- state
settings:
- name: download_url
kind: string
description: GIAS bulk CSV download URL
value: https://ea-edubase-api-prod.azurewebsites.net/edubase/downloads/public/edubasealldata.csv
- name: tap-uk-ees
namespace: uk_ees
pip_url: ./plugins/extractors/tap-uk-ees
executable: tap-uk-ees
capabilities:
- catalog
- state
settings:
- name: base_url
kind: string
value: https://content.explore-education-statistics.service.gov.uk/api/v1
- name: datasets
kind: array
description: List of EES dataset configs to extract
- name: tap-uk-ofsted
namespace: uk_ofsted
pip_url: ./plugins/extractors/tap-uk-ofsted
executable: tap-uk-ofsted
capabilities:
- catalog
- state
settings:
- name: mi_url
kind: string
description: Ofsted Management Information download URL
- name: tap-uk-parent-view
namespace: uk_parent_view
pip_url: ./plugins/extractors/tap-uk-parent-view
executable: tap-uk-parent-view
capabilities:
- catalog
- name: tap-uk-fbit
namespace: uk_fbit
pip_url: ./plugins/extractors/tap-uk-fbit
executable: tap-uk-fbit
capabilities:
- catalog
- state
settings:
- name: base_url
kind: string
value: https://financial-benchmarking-and-insights-tool.education.gov.uk/api
- name: tap-uk-idaci
namespace: uk_idaci
pip_url: ./plugins/extractors/tap-uk-idaci
executable: tap-uk-idaci
capabilities:
- catalog
loaders:
- name: target-postgres
variant: transferwise
pip_url: pipelinewise-target-postgres
config:
host: $PG_HOST
port: $PG_PORT
user: $PG_USER
password: $PG_PASSWORD
dbname: $PG_DATABASE
default_target_schema: raw
utilities:
- name: dbt-postgres
variant: dbt-labs
pip_url: dbt-postgres~=1.8
config:
project_dir: $MELTANO_PROJECT_ROOT/transform
profiles_dir: $MELTANO_PROJECT_ROOT/transform
environments:
- name: dev
config:
plugins:
loaders:
- name: target-postgres
config:
host: localhost
port: 5432
user: postgres
password: postgres
dbname: school_compare
- name: production
config:
plugins:
loaders:
- name: target-postgres
config:
host: ${PG_HOST}
port: ${PG_PORT}
user: ${PG_USER}
password: ${PG_PASSWORD}
dbname: ${PG_DATABASE}