refactor(phase): merge KS2+KS4 into fact_performance, fix all phase inconsistencies
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 50s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m24s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 50s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m24s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s
Root cause: the UNION ALL query in data_loader.py produced two rows per all-through school per year (one KS2, one KS4), with drop_duplicates() silently discarding the KS4 row. Fixes: - New dbt mart `fact_performance`: FULL OUTER JOIN of fact_ks2_performance and fact_ks4_performance on (urn, year). One row per school per year. All-through schools have both KS2 and KS4 columns populated. - data_loader.py: replace 175-line UNION ALL with a simple JOIN to fact_performance. No more duplicate rows or drop_duplicates needed. - sync_typesense.py: single LATERAL JOIN to fact_performance instead of two separate KS2/KS4 joins. - app.py: remove drop_duplicates (no longer needed); add PHASE_GROUPS constant so all-through/middle schools appear in primary and secondary filter results (were previously invisible to both); scope result_filters gender/admissions_policies to secondary schools only. - HomeView.tsx: isSecondaryView is now majority-based (not "any secondary") and isMixedView shows both sort option sets for mixed result sets. - school/[slug]/page.tsx: all-through schools route to SchoolDetailView (renders both SATs + GCSE sections) instead of SecondarySchoolDetailView (KS4-only). Dedicated SEO metadata for all-through schools. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -35,6 +35,14 @@ from .utils import clean_for_json
|
||||
# Values to exclude from filter dropdowns (empty strings, non-applicable labels)
|
||||
EXCLUDED_FILTER_VALUES = {"", "Not applicable", "Does not apply"}
|
||||
|
||||
# Maps user-facing phase filter values to the GIAS PhaseOfEducation values they include.
|
||||
# All-through schools appear in both primary and secondary results.
|
||||
PHASE_GROUPS: dict[str, set[str]] = {
|
||||
"primary": {"primary", "middle deemed primary", "all-through"},
|
||||
"secondary": {"secondary", "middle deemed secondary", "all-through", "16 plus"},
|
||||
"all-through": {"all-through"},
|
||||
}
|
||||
|
||||
BASE_URL = "https://schoolcompare.co.uk"
|
||||
MAX_SLUG_LENGTH = 60
|
||||
|
||||
@@ -343,20 +351,13 @@ async def get_schools(
|
||||
)
|
||||
df_latest = df_latest.merge(prev_rwm, on="urn", how="left")
|
||||
|
||||
# Phase filter
|
||||
# Phase filter — uses PHASE_GROUPS so all-through/middle schools appear
|
||||
# in the correct phase(s) rather than being invisible to both filters.
|
||||
if phase:
|
||||
phase_lower = phase.lower()
|
||||
if phase_lower in ("primary", "secondary", "all-through", "all_through"):
|
||||
# Map param values to GIAS phase strings (partial match)
|
||||
phase_map = {
|
||||
"primary": "primary",
|
||||
"secondary": "secondary",
|
||||
"all-through": "all-through",
|
||||
"all_through": "all-through",
|
||||
}
|
||||
phase_substr = phase_map[phase_lower]
|
||||
schools_df_phase_mask = df_latest["phase"].str.lower().str.contains(phase_substr, na=False)
|
||||
df_latest = df_latest[schools_df_phase_mask]
|
||||
phase_lower = phase.lower().replace("_", "-")
|
||||
allowed = PHASE_GROUPS.get(phase_lower)
|
||||
if allowed:
|
||||
df_latest = df_latest[df_latest["phase"].str.lower().isin(allowed)]
|
||||
|
||||
# Secondary-specific filters (after phase filter)
|
||||
if gender:
|
||||
@@ -389,7 +390,8 @@ async def get_schools(
|
||||
for c in SCHOOL_COLUMNS + location_cols + result_cols
|
||||
if c in df_latest.columns
|
||||
]
|
||||
schools_df = df_latest[available_cols].drop_duplicates(subset=["urn"])
|
||||
# fact_performance guarantees one row per (urn, year); df_latest has one row per urn.
|
||||
schools_df = df_latest[available_cols]
|
||||
|
||||
# Location-based search (uses pre-geocoded data from database)
|
||||
search_coords = None
|
||||
@@ -458,13 +460,16 @@ async def get_schools(
|
||||
schools_df["school_type"].str.lower() == school_type.lower()
|
||||
]
|
||||
|
||||
# Compute result-scoped filter values (before pagination)
|
||||
# Compute result-scoped filter values (before pagination).
|
||||
# Gender and admissions are secondary-only filters — scope them to schools
|
||||
# with KS4 data so they don't appear for purely primary result sets.
|
||||
_sec_mask = schools_df["attainment_8_score"].notna() if "attainment_8_score" in schools_df.columns else pd.Series(False, index=schools_df.index)
|
||||
result_filters = {
|
||||
"local_authorities": clean_filter_values(schools_df["local_authority"]) if "local_authority" in schools_df.columns else [],
|
||||
"school_types": clean_filter_values(schools_df["school_type"]) if "school_type" in schools_df.columns else [],
|
||||
"phases": clean_filter_values(schools_df["phase"]) if "phase" in schools_df.columns else [],
|
||||
"genders": clean_filter_values(schools_df["gender"]) if "gender" in schools_df.columns else [],
|
||||
"admissions_policies": clean_filter_values(schools_df["admissions_policy"]) if "admissions_policy" in schools_df.columns else [],
|
||||
"genders": clean_filter_values(schools_df.loc[_sec_mask, "gender"]) if "gender" in schools_df.columns and _sec_mask.any() else [],
|
||||
"admissions_policies": clean_filter_values(schools_df.loc[_sec_mask, "admissions_policy"]) if "admissions_policy" in schools_df.columns and _sec_mask.any() else [],
|
||||
}
|
||||
|
||||
# Pagination
|
||||
|
||||
Reference in New Issue
Block a user