LegacyKS2Stream now auto-detects ZIP vs bare CSV — if the download is a ZIP
it extracts england_ks2final.csv; if it's a plain CSV file it reads directly.
This keeps backwards compatibility while allowing both streams to share the
same DfE annual archive URLs.
legacy_ks2_urls updated to point at the same 4 ZIPs as legacy_ks4_urls so
only one set of archives needs to be maintained going forward.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirrors the existing legacy KS2 pattern to fill the gap before EES hosted
KS4 data. Four files changed:
- tap-uk-ees: LegacyKS4Stream downloads each year's DfE Compare School
Performance ZIP, extracts england_ks4final.csv, maps 416 legacy columns
to Singer fields, strips % suffixes. Registered in discover_streams().
TapUKEES.config_jsonschema gains legacy_ks4_urls setting.
- stg_legacy_ks4.sql: safe_numeric casts + NULL placeholders for columns
not present in legacy format (ebacc_avg_score, gcse_grade_91_pct,
prior_attainment_avg, sen_pct).
- int_ks4_with_lineage.sql: adds all_ks4 CTE unioning stg_ees_ks4 and
stg_legacy_ks4, matching the int_ks2_with_lineage pattern.
- _stg_sources.yml + meltano.yml: source declaration and setting definition
for legacy_ks4. URLs configured per-year once provided.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The file hosting uses non-deterministic URLs, so replace legacy_ks2_base_url
+ legacy_ks2_years with a single legacy_ks2_urls object mapping year codes
to download URLs. Configure the 4 pre-COVID years in meltano.yml.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Meltano 4.x requires an environment to be specified. Set production as
the default. Also remove the deprecated 'version: 2' field.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The meltanolabs target-postgres variant expects 'database' as the
config key, not 'dbname' (which was the pipelinewise variant's key).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `catalog` capability forced Meltano to run --discover and generate
a catalog file (tap.properties.json) before each extraction. This fails
because our Singer SDK taps emit schemas inline and don't need external
catalog files. Removing the capability makes Meltano invoke taps
directly without catalog generation.
Also switch from deprecated `meltano elt` to `meltano run` for
Meltano 4.x compatibility.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>