fix(ees-tap): strip UTF-8 BOM from CSV column names

Some DfE supporting-files CSVs have a UTF-8 BOM on the first column, causing it to be named '\ufefftime_period' instead of 'time_period'. This trips Singer schema validation ('time_period' is a required property). Strip the BOM from all column names after read_csv. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:54:15 +00:00
parent f4f0257447
commit b8ecc5c58b
1 changed files with 3 additions and 0 deletions
@@ -83,6 +83,9 @@ class EESDatasetStream(Stream):
        with zf.open(target) as f:
            df = pd.read_csv(f, dtype=str, keep_default_na=False, encoding=self._encoding)

+        # Strip UTF-8 BOM from column names (some DfE files have a BOM on the first column)
+        df.columns = df.columns.str.lstrip("\ufeff")
+
        # Filter to school-level data if the column exists
        if "geographic_level" in df.columns:
            df = df[df["geographic_level"] == "School"]