fix(ofsted): tighten framework detection to avoid false ReportCard classification
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 58s
Build and Push Docker Images / Build Kestra Init (push) Successful in 33s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s

The old OEIF CSV contains columns whose names include substrings like
'inclusion' and 'achievement', causing _detect_framework() to wrongly return
'ReportCard' for pre-Nov-2025 inspections.

Fix: check for OEIF-specific phrases first ('overall effectiveness', 'quality
of education', 'behaviour and attitudes'). Only if none are found, look for
multi-word RC-specific phrases. Default to OEIF as a safe fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-25 14:55:10 +00:00
parent b850e8639c
commit 5720e18358

View File

@@ -228,15 +228,34 @@ def _parse_date(val) -> date | None:
def _detect_framework(df: pd.DataFrame) -> str:
"""Return 'ReportCard' if new-format columns are present, else 'OEIF'."""
rc_indicators = [
"inclusion", "curriculum and teaching", "achievement",
"attendance and behaviour", "safeguarding standards", "safeguarding",
]
"""Return 'ReportCard' if new-format columns are present, else 'OEIF'.
Strategy: check for OEIF-specific phrases first (positive evidence of the
old format). Only if none are found, look for RC-specific phrases.
Defaults to 'OEIF' so misdetection is always a safe fallback.
"""
cols_lower = {c.lower() for c in df.columns}
# Phrases unique to the old OEIF CSV — if any present, it's OEIF.
oeif_indicators = [
"overall effectiveness",
"quality of education",
"behaviour and attitudes",
]
for indicator in oeif_indicators:
if any(indicator in c for c in cols_lower):
return "OEIF"
# Phrases unique to the new Report Card CSV — multi-word, RC-specific.
rc_indicators = [
"curriculum and teaching",
"leadership and governance",
"attendance and behaviour",
]
for indicator in rc_indicators:
if any(indicator in c for c in cols_lower):
return "ReportCard"
return "OEIF"