fix(tap-uk-ofsted): fix header row detection matching 'urn' inside 'turn'
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The preamble row in Ofsted CSVs contains 'turn off all filters' which matched 'urn' in line.lower(), so header_idx was set to 0 instead of the real header row. Use a regex that matches URN only as a CSV field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -137,7 +137,9 @@ class OfstedInspectionsStream(Stream):
|
||||
lines = text.split("\n")
|
||||
header_idx = 0
|
||||
for i, line in enumerate(lines[:20]):
|
||||
if "URN" in line or "urn" in line.lower():
|
||||
# Match lines where URN appears as a CSV field (start or after comma),
|
||||
# not as a substring of words like "turn" or "return".
|
||||
if re.search(r'(?:^|,)\s*URN\s*(?:,|$)', line):
|
||||
header_idx = i
|
||||
break
|
||||
df = pd.read_csv(
|
||||
|
||||
Reference in New Issue
Block a user