fix(tap-uk-ofsted): fix header row detection matching 'urn' inside 'turn'
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m40s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
The preamble row in Ofsted CSVs contains 'turn off all filters' which matched 'urn' in line.lower(), so header_idx was set to 0 instead of the real header row. Use a regex that matches URN only as a CSV field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -137,7 +137,9 @@ class OfstedInspectionsStream(Stream):
|
|||||||
lines = text.split("\n")
|
lines = text.split("\n")
|
||||||
header_idx = 0
|
header_idx = 0
|
||||||
for i, line in enumerate(lines[:20]):
|
for i, line in enumerate(lines[:20]):
|
||||||
if "URN" in line or "urn" in line.lower():
|
# Match lines where URN appears as a CSV field (start or after comma),
|
||||||
|
# not as a substring of words like "turn" or "return".
|
||||||
|
if re.search(r'(?:^|,)\s*URN\s*(?:,|$)', line):
|
||||||
header_idx = i
|
header_idx = i
|
||||||
break
|
break
|
||||||
df = pd.read_csv(
|
df = pd.read_csv(
|
||||||
|
|||||||
Reference in New Issue
Block a user