feat: migrate backend to marts schema, update EES tap for verified datasets
Pipeline: - EES tap: split KS4 into performance + info streams, fix admissions filename (SchoolLevel keyword match), fix census filename (yearly suffix), remove phonics (no school-level data on EES), change endswith → in for matching - stg_ees_ks4: rewrite to filter long-format data and extract Attainment 8, Progress 8, EBacc, English/Maths metrics; join KS4 info for context - stg_ees_admissions: map real CSV columns (total_number_places_offered, etc.) - stg_ees_census: update source reference, stub with TODO for data columns - Remove stg_ees_phonics, fact_phonics (no school-level EES data) - Add ees_ks4_performance + ees_ks4_info sources, remove ees_ks4 + ees_phonics - Update int_ks4_with_lineage + fact_ks4_performance with new KS4 columns - Annual EES DAG: remove stg_ees_phonics+ from selector Backend: - models.py: replace all models to point at marts.* tables with schema='marts' (DimSchool, DimLocation, KS2Performance, FactOfstedInspection, etc.) - data_loader.py: rewrite load_school_data_as_dataframe() using raw SQL joining dim_school + dim_location + fact_ks2_performance; update get_supplementary_data() - database.py: remove migration machinery, keep only connection setup - app.py: remove check_and_migrate_if_needed, remove /api/admin/reimport-ks2 endpoints (pipeline handles all imports) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -120,12 +120,12 @@ with DAG(
|
||||
extract_ofsted >> dbt_build_ofsted >> sync_typesense_ofsted
|
||||
|
||||
|
||||
# ── Annual DAG (EES: KS2, KS4, Census, Admissions, Phonics) ───────────
|
||||
# ── Annual DAG (EES: KS2, KS4, Census, Admissions) ───────────────────
|
||||
|
||||
with DAG(
|
||||
dag_id="school_data_annual_ees",
|
||||
default_args=default_args,
|
||||
description="Annual EES data extraction (KS2, KS4, Census, Admissions, Phonics)",
|
||||
description="Annual EES data extraction (KS2, KS4, Census, Admissions)",
|
||||
schedule=None, # Triggered manually when new releases are published
|
||||
start_date=datetime(2025, 1, 1),
|
||||
catchup=False,
|
||||
@@ -140,7 +140,7 @@ with DAG(
|
||||
|
||||
dbt_build_ees = BashOperator(
|
||||
task_id="dbt_build",
|
||||
bash_command=f"cd {PIPELINE_DIR}/transform && {DBT_BIN} build --profiles-dir . --target production --select stg_ees_ks2+ stg_ees_ks4+ stg_ees_census+ stg_ees_admissions+ stg_ees_phonics+",
|
||||
bash_command=f"cd {PIPELINE_DIR}/transform && {DBT_BIN} build --profiles-dir . --target production --select stg_ees_ks2+ stg_ees_ks4+ stg_ees_census+ stg_ees_admissions+",
|
||||
)
|
||||
|
||||
sync_typesense_ees = BashOperator(
|
||||
|
||||
Reference in New Issue
Block a user