fix(ofsted): tighten framework detection to avoid false ReportCard classification
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 58s
Build and Push Docker Images / Build Kestra Init (push) Successful in 33s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 33s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 58s
Build and Push Docker Images / Build Kestra Init (push) Successful in 33s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
The old OEIF CSV contains columns whose names include substrings like
'inclusion' and 'achievement', causing _detect_framework() to wrongly return
'ReportCard' for pre-Nov-2025 inspections.
Fix: check for OEIF-specific phrases first ('overall effectiveness', 'quality
of education', 'behaviour and attitudes'). Only if none are found, look for
multi-word RC-specific phrases. Default to OEIF as a safe fallback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -228,15 +228,34 @@ def _parse_date(val) -> date | None:
|
|||||||
|
|
||||||
|
|
||||||
def _detect_framework(df: pd.DataFrame) -> str:
|
def _detect_framework(df: pd.DataFrame) -> str:
|
||||||
"""Return 'ReportCard' if new-format columns are present, else 'OEIF'."""
|
"""Return 'ReportCard' if new-format columns are present, else 'OEIF'.
|
||||||
rc_indicators = [
|
|
||||||
"inclusion", "curriculum and teaching", "achievement",
|
Strategy: check for OEIF-specific phrases first (positive evidence of the
|
||||||
"attendance and behaviour", "safeguarding standards", "safeguarding",
|
old format). Only if none are found, look for RC-specific phrases.
|
||||||
]
|
Defaults to 'OEIF' so misdetection is always a safe fallback.
|
||||||
|
"""
|
||||||
cols_lower = {c.lower() for c in df.columns}
|
cols_lower = {c.lower() for c in df.columns}
|
||||||
|
|
||||||
|
# Phrases unique to the old OEIF CSV — if any present, it's OEIF.
|
||||||
|
oeif_indicators = [
|
||||||
|
"overall effectiveness",
|
||||||
|
"quality of education",
|
||||||
|
"behaviour and attitudes",
|
||||||
|
]
|
||||||
|
for indicator in oeif_indicators:
|
||||||
|
if any(indicator in c for c in cols_lower):
|
||||||
|
return "OEIF"
|
||||||
|
|
||||||
|
# Phrases unique to the new Report Card CSV — multi-word, RC-specific.
|
||||||
|
rc_indicators = [
|
||||||
|
"curriculum and teaching",
|
||||||
|
"leadership and governance",
|
||||||
|
"attendance and behaviour",
|
||||||
|
]
|
||||||
for indicator in rc_indicators:
|
for indicator in rc_indicators:
|
||||||
if any(indicator in c for c in cols_lower):
|
if any(indicator in c for c in cols_lower):
|
||||||
return "ReportCard"
|
return "ReportCard"
|
||||||
|
|
||||||
return "OEIF"
|
return "OEIF"
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user