fix(tap-uk-ofsted): fix header row detection matching 'urn' inside 'turn'
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s

The preamble row in Ofsted CSVs contains 'turn off all filters' which
matched 'urn' in line.lower(), so header_idx was set to 0 instead of
the real header row. Use a regex that matches URN only as a CSV field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-27 17:05:03 +00:00
parent e56a63c59c
commit 26aa3c2d70

View File

@@ -137,7 +137,9 @@ class OfstedInspectionsStream(Stream):
lines = text.split("\n") lines = text.split("\n")
header_idx = 0 header_idx = 0
for i, line in enumerate(lines[:20]): for i, line in enumerate(lines[:20]):
if "URN" in line or "urn" in line.lower(): # Match lines where URN appears as a CSV field (start or after comma),
# not as a substring of words like "turn" or "return".
if re.search(r'(?:^|,)\s*URN\s*(?:,|$)', line):
header_idx = i header_idx = i
break break
df = pd.read_csv( df = pd.read_csv(