fix(pipeline): expand GIAS schema, handle empty strings, scope DAG selectors
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 34s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m8s
Build and Push Docker Images / Build Integrator (push) Successful in 57s
Build and Push Docker Images / Build Kestra Init (push) Successful in 34s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m39s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
- Declare all 34 columns needed by dbt in GIAS tap schema (target-postgres only persists columns present in the Singer schema message) - Use nullif() for empty-string-to-integer/date casts in staging models - Scope daily DAG dbt build to GIAS models only (stg_gias_establishments+ stg_gias_links+) to avoid errors on unloaded sources - Scope annual EES DAG similarly; remove redundant dbt test steps - Make dim_school gracefully handle missing int_ofsted_latest table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -25,9 +25,9 @@ class GIASEstablishmentsStream(Stream):
|
||||
primary_keys = ["URN"]
|
||||
replication_key = None
|
||||
|
||||
# Schema is wide (~250 columns); we declare key columns and pass through the rest
|
||||
# All columns are read as strings from CSV; dbt staging models handle type casting.
|
||||
# Only URN is cast to int in get_records() for the primary key.
|
||||
# All columns used by dbt staging models must be declared here —
|
||||
# target-postgres only persists columns present in the Singer schema.
|
||||
# All non-PK columns are StringType; dbt handles type casting.
|
||||
schema = th.PropertiesList(
|
||||
th.Property("URN", th.IntegerType, required=True),
|
||||
th.Property("EstablishmentName", th.StringType),
|
||||
@@ -38,6 +38,31 @@ class GIASEstablishmentsStream(Stream):
|
||||
th.Property("EstablishmentNumber", th.StringType),
|
||||
th.Property("EstablishmentStatus (name)", th.StringType),
|
||||
th.Property("Postcode", th.StringType),
|
||||
th.Property("Gender (name)", th.StringType),
|
||||
th.Property("ReligiousCharacter (name)", th.StringType),
|
||||
th.Property("AdmissionsPolicy (name)", th.StringType),
|
||||
th.Property("SchoolCapacity", th.StringType),
|
||||
th.Property("NumberOfPupils", th.StringType),
|
||||
th.Property("HeadTitle (name)", th.StringType),
|
||||
th.Property("HeadFirstName", th.StringType),
|
||||
th.Property("HeadLastName", th.StringType),
|
||||
th.Property("TelephoneNum", th.StringType),
|
||||
th.Property("SchoolWebsite", th.StringType),
|
||||
th.Property("Street", th.StringType),
|
||||
th.Property("Locality", th.StringType),
|
||||
th.Property("Town", th.StringType),
|
||||
th.Property("County (name)", th.StringType),
|
||||
th.Property("OpenDate", th.StringType),
|
||||
th.Property("CloseDate", th.StringType),
|
||||
th.Property("Trusts (name)", th.StringType),
|
||||
th.Property("Trusts (code)", th.StringType),
|
||||
th.Property("UrbanRural (name)", th.StringType),
|
||||
th.Property("ParliamentaryConstituency (name)", th.StringType),
|
||||
th.Property("NurseryProvision (name)", th.StringType),
|
||||
th.Property("Easting", th.StringType),
|
||||
th.Property("Northing", th.StringType),
|
||||
th.Property("StatutoryLowAge", th.StringType),
|
||||
th.Property("StatutoryHighAge", th.StringType),
|
||||
).to_dict()
|
||||
|
||||
def get_records(self, context):
|
||||
|
||||
Reference in New Issue
Block a user