Files
school_compare/pipeline/meltano.yml
T
Tudor Sitaru 7e6ded29e2
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 12s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 52s
Build and Push Docker Images / Trigger Portainer Update (push) Has been cancelled
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Has been cancelled
feat(pipeline): add legacy KS4 backfill (2015/16–2018/19)
Mirrors the existing legacy KS2 pattern to fill the gap before EES hosted
KS4 data. Four files changed:

- tap-uk-ees: LegacyKS4Stream downloads each year's DfE Compare School
  Performance ZIP, extracts england_ks4final.csv, maps 416 legacy columns
  to Singer fields, strips % suffixes. Registered in discover_streams().
  TapUKEES.config_jsonschema gains legacy_ks4_urls setting.

- stg_legacy_ks4.sql: safe_numeric casts + NULL placeholders for columns
  not present in legacy format (ebacc_avg_score, gcse_grade_91_pct,
  prior_attainment_avg, sen_pct).

- int_ks4_with_lineage.sql: adds all_ks4 CTE unioning stg_ees_ks4 and
  stg_legacy_ks4, matching the int_ks2_with_lineage pattern.

- _stg_sources.yml + meltano.yml: source declaration and setting definition
  for legacy_ks4. URLs configured per-year once provided.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:37:24 +01:00

110 lines
3.2 KiB
YAML

project_id: school-compare-pipeline
default_environment: production
plugins:
extractors:
- name: tap-uk-gias
namespace: uk_gias
pip_url: ./plugins/extractors/tap-uk-gias
executable: tap-uk-gias
settings:
- name: download_url
kind: string
description: GIAS bulk CSV download URL
- name: tap-uk-ees
namespace: uk_ees
pip_url: ./plugins/extractors/tap-uk-ees
executable: tap-uk-ees
settings:
- name: base_url
kind: string
value: https://content.explore-education-statistics.service.gov.uk/api/v1
- name: datasets
kind: array
description: List of EES dataset configs to extract
- name: legacy_ks2_urls
kind: object
description: "Year code → URL mapping for legacy KS2 CSVs"
- name: legacy_ks4_urls
kind: object
description: "Year code → URL mapping for legacy KS4 ZIPs (england_ks4final.csv inside)"
config:
legacy_ks2_urls:
"201516": "http://10.0.1.224:8081/filebrowser/api/public/dl/R9jjXFWa?inline=true"
"201617": "http://10.0.1.224:8081/filebrowser/api/public/dl/tIwJPVQS?inline=true"
"201718": "http://10.0.1.224:8081/filebrowser/api/public/dl/GO7SKE0p?inline=true"
"201819": "http://10.0.1.224:8081/filebrowser/api/public/dl/jchDEHsv?inline=true"
- name: tap-uk-ofsted
namespace: uk_ofsted
pip_url: ./plugins/extractors/tap-uk-ofsted
executable: tap-uk-ofsted
settings:
- name: mi_url
kind: string
description: Ofsted Management Information download URL
- name: tap-uk-parent-view
namespace: uk_parent_view
pip_url: ./plugins/extractors/tap-uk-parent-view
executable: tap-uk-parent-view
- name: tap-uk-fbit
namespace: uk_fbit
pip_url: ./plugins/extractors/tap-uk-fbit
executable: tap-uk-fbit
settings:
- name: base_url
kind: string
value: https://financial-benchmarking-and-insights-tool.education.gov.uk/api
- name: tap-uk-idaci
namespace: uk_idaci
pip_url: ./plugins/extractors/tap-uk-idaci
executable: tap-uk-idaci
loaders:
- name: target-postgres
variant: meltanolabs
pip_url: meltanolabs-target-postgres
config:
host: $PG_HOST
port: $PG_PORT
user: $PG_USER
password: $PG_PASSWORD
database: $PG_DATABASE
default_target_schema: raw
utilities:
- name: dbt-postgres
variant: dbt-labs
pip_url: dbt-postgres~=1.10
config:
project_dir: $MELTANO_PROJECT_ROOT/transform
profiles_dir: $MELTANO_PROJECT_ROOT/transform
environments:
- name: dev
config:
plugins:
loaders:
- name: target-postgres
config:
host: localhost
port: 5432
user: postgres
password: postgres
database: school_compare
- name: production
config:
plugins:
loaders:
- name: target-postgres
config:
host: ${PG_HOST}
port: ${PG_PORT}
user: ${PG_USER}
password: ${PG_PASSWORD}
database: ${PG_DATABASE}