fix(pipeline): expand GIAS schema, handle empty strings, scope DAG selectors
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 34s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 34s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Declare all 34 columns needed by dbt in GIAS tap schema (target-postgres only persists columns present in the Singer schema message) - Use nullif() for empty-string-to-integer/date casts in staging models - Scope daily DAG dbt build to GIAS models only (stg_gias_establishments+ stg_gias_links+) to avoid errors on unloaded sources - Scope annual EES DAG similarly; remove redundant dbt test steps - Make dim_school gracefully handle missing int_ofsted_latest table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -79,12 +79,7 @@ print(f'Validation passed: {{count}} GIAS rows')
|
||||
|
||||
dbt_build = BashOperator(
|
||||
task_id="dbt_build",
|
||||
bash_command=f"cd {PIPELINE_DIR}/transform && {DBT_BIN} build --profiles-dir . --target production",
|
||||
)
|
||||
|
||||
dbt_test = BashOperator(
|
||||
task_id="dbt_test",
|
||||
bash_command=f"cd {PIPELINE_DIR}/transform && {DBT_BIN} test --profiles-dir . --target production",
|
||||
bash_command=f"cd {PIPELINE_DIR}/transform && {DBT_BIN} build --profiles-dir . --target production --select stg_gias_establishments+ stg_gias_links+",
|
||||
)
|
||||
|
||||
geocode_new = BashOperator(
|
||||
@@ -97,7 +92,7 @@ print(f'Validation passed: {{count}} GIAS rows')
|
||||
bash_command=f"cd {PIPELINE_DIR} && python scripts/sync_typesense.py",
|
||||
)
|
||||
|
||||
extract_group >> validate_raw >> dbt_build >> dbt_test >> geocode_new >> sync_typesense
|
||||
extract_group >> validate_raw >> dbt_build >> geocode_new >> sync_typesense
|
||||
|
||||
|
||||
# ── Monthly DAG (Ofsted) ───────────────────────────────────────────────
|
||||
@@ -150,12 +145,7 @@ with DAG(
|
||||
|
||||
dbt_build_ees = BashOperator(
|
||||
task_id="dbt_build",
|
||||
bash_command=f"cd {PIPELINE_DIR}/transform && {DBT_BIN} build --profiles-dir . --target production",
|
||||
)
|
||||
|
||||
dbt_test_ees = BashOperator(
|
||||
task_id="dbt_test",
|
||||
bash_command=f"cd {PIPELINE_DIR}/transform && {DBT_BIN} test --profiles-dir . --target production",
|
||||
bash_command=f"cd {PIPELINE_DIR}/transform && {DBT_BIN} build --profiles-dir . --target production --select stg_ees_ks2+ stg_ees_ks4+ stg_ees_census+ stg_ees_admissions+ stg_ees_phonics+",
|
||||
)
|
||||
|
||||
sync_typesense_ees = BashOperator(
|
||||
@@ -163,4 +153,4 @@ with DAG(
|
||||
bash_command=f"cd {PIPELINE_DIR} && python scripts/sync_typesense.py",
|
||||
)
|
||||
|
||||
extract_ees_group >> dbt_build_ees >> dbt_test_ees >> sync_typesense_ees
|
||||
extract_ees_group >> dbt_build_ees >> sync_typesense_ees
|
||||
|
||||
Reference in New Issue
Block a user