fix(tap-gias): declare numeric CSV columns as StringType
Some checks failed
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 35s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Failing after 30s
Build and Push Docker Images / Build Kestra Init (push) Failing after 30s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Failing after 29s
Build and Push Docker Images / Trigger Portainer Update (push) Has been skipped

CSV is read with dtype=str so all values arrive as strings. Declaring
LA (code) and EstablishmentNumber as IntegerType caused schema
validation failures in target-postgres. Use StringType for all columns
except URN (which is explicitly cast to int for the primary key).
Type casting happens in dbt staging models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-26 14:03:26 +00:00
parent 84261f6125
commit 0062a5eabe

View File

@@ -26,14 +26,16 @@ class GIASEstablishmentsStream(Stream):
replication_key = None
# Schema is wide (~250 columns); we declare key columns and pass through the rest
# All columns are read as strings from CSV; dbt staging models handle type casting.
# Only URN is cast to int in get_records() for the primary key.
schema = th.PropertiesList(
th.Property("URN", th.IntegerType, required=True),
th.Property("EstablishmentName", th.StringType),
th.Property("TypeOfEstablishment (name)", th.StringType),
th.Property("PhaseOfEducation (name)", th.StringType),
th.Property("LA (code)", th.IntegerType),
th.Property("LA (code)", th.StringType),
th.Property("LA (name)", th.StringType),
th.Property("EstablishmentNumber", th.IntegerType),
th.Property("EstablishmentNumber", th.StringType),
th.Property("EstablishmentStatus (name)", th.StringType),
th.Property("Postcode", th.StringType),
).to_dict()