fix(ees-tap): strip UTF-8 BOM from CSV column names
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m42s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m42s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Some DfE supporting-files CSVs have a UTF-8 BOM on the first column,
causing it to be named '\ufefftime_period' instead of 'time_period'.
This trips Singer schema validation ('time_period' is a required property).
Strip the BOM from all column names after read_csv.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -83,6 +83,9 @@ class EESDatasetStream(Stream):
|
||||
with zf.open(target) as f:
|
||||
df = pd.read_csv(f, dtype=str, keep_default_na=False, encoding=self._encoding)
|
||||
|
||||
# Strip UTF-8 BOM from column names (some DfE files have a BOM on the first column)
|
||||
df.columns = df.columns.str.lstrip("\ufeff")
|
||||
|
||||
# Filter to school-level data if the column exists
|
||||
if "geographic_level" in df.columns:
|
||||
df = df[df["geographic_level"] == "School"]
|
||||
|
||||
Reference in New Issue
Block a user