fix(pipeline): expand GIAS schema, handle empty strings, scope DAG selectors
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 34s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s

- Declare all 34 columns needed by dbt in GIAS tap schema (target-postgres
  only persists columns present in the Singer schema message)
- Use nullif() for empty-string-to-integer/date casts in staging models
- Scope daily DAG dbt build to GIAS models only (stg_gias_establishments+
  stg_gias_links+) to avoid errors on unloaded sources
- Scope annual EES DAG similarly; remove redundant dbt test steps
- Make dim_school gracefully handle missing int_ofsted_latest table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-26 20:43:24 +00:00
parent 24cfb83144
commit e7b1ab9f37
5 changed files with 59 additions and 35 deletions

View File

@@ -25,9 +25,9 @@ class GIASEstablishmentsStream(Stream):
primary_keys = ["URN"]
replication_key = None
# Schema is wide (~250 columns); we declare key columns and pass through the rest
# All columns are read as strings from CSV; dbt staging models handle type casting.
# Only URN is cast to int in get_records() for the primary key.
# All columns used by dbt staging models must be declared here —
# target-postgres only persists columns present in the Singer schema.
# All non-PK columns are StringType; dbt handles type casting.
schema = th.PropertiesList(
th.Property("URN", th.IntegerType, required=True),
th.Property("EstablishmentName", th.StringType),
@@ -38,6 +38,31 @@ class GIASEstablishmentsStream(Stream):
th.Property("EstablishmentNumber", th.StringType),
th.Property("EstablishmentStatus (name)", th.StringType),
th.Property("Postcode", th.StringType),
th.Property("Gender (name)", th.StringType),
th.Property("ReligiousCharacter (name)", th.StringType),
th.Property("AdmissionsPolicy (name)", th.StringType),
th.Property("SchoolCapacity", th.StringType),
th.Property("NumberOfPupils", th.StringType),
th.Property("HeadTitle (name)", th.StringType),
th.Property("HeadFirstName", th.StringType),
th.Property("HeadLastName", th.StringType),
th.Property("TelephoneNum", th.StringType),
th.Property("SchoolWebsite", th.StringType),
th.Property("Street", th.StringType),
th.Property("Locality", th.StringType),
th.Property("Town", th.StringType),
th.Property("County (name)", th.StringType),
th.Property("OpenDate", th.StringType),
th.Property("CloseDate", th.StringType),
th.Property("Trusts (name)", th.StringType),
th.Property("Trusts (code)", th.StringType),
th.Property("UrbanRural (name)", th.StringType),
th.Property("ParliamentaryConstituency (name)", th.StringType),
th.Property("NurseryProvision (name)", th.StringType),
th.Property("Easting", th.StringType),
th.Property("Northing", th.StringType),
th.Property("StatutoryLowAge", th.StringType),
th.Property("StatutoryHighAge", th.StringType),
).to_dict()
def get_records(self, context):