refactor(phase): merge KS2+KS4 into fact_performance, fix all phase inconsistencies
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 50s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m12s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m24s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 0s

Root cause: the UNION ALL query in data_loader.py produced two rows per
all-through school per year (one KS2, one KS4), with drop_duplicates()
silently discarding the KS4 row. Fixes:

- New dbt mart `fact_performance`: FULL OUTER JOIN of fact_ks2_performance
  and fact_ks4_performance on (urn, year). One row per school per year.
  All-through schools have both KS2 and KS4 columns populated.
- data_loader.py: replace 175-line UNION ALL with a simple JOIN to
  fact_performance. No more duplicate rows or drop_duplicates needed.
- sync_typesense.py: single LATERAL JOIN to fact_performance instead of
  two separate KS2/KS4 joins.
- app.py: remove drop_duplicates (no longer needed); add PHASE_GROUPS
  constant so all-through/middle schools appear in primary and secondary
  filter results (were previously invisible to both); scope result_filters
  gender/admissions_policies to secondary schools only.
- HomeView.tsx: isSecondaryView is now majority-based (not "any secondary")
  and isMixedView shows both sort option sets for mixed result sets.
- school/[slug]/page.tsx: all-through schools route to SchoolDetailView
  (renders both SATs + GCSE sections) instead of SecondarySchoolDetailView
  (KS4-only). Dedicated SEO metadata for all-through schools.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-30 14:07:30 +01:00
parent 695a571c1f
commit 6e5249aa1e
7 changed files with 227 additions and 216 deletions

View File

@@ -58,24 +58,14 @@ QUERY_BASE = """
LEFT JOIN marts.dim_location l ON s.urn = l.urn
"""
QUERY_KS2_JOIN = """
QUERY_PERFORMANCE_JOIN = """
LEFT JOIN LATERAL (
SELECT rwm_expected_pct
FROM marts.fact_ks2_performance
SELECT rwm_expected_pct, progress_8_score
FROM marts.fact_performance
WHERE urn = s.urn
ORDER BY year DESC
LIMIT 1
) ks2 ON true
"""
QUERY_KS4_JOIN = """
LEFT JOIN LATERAL (
SELECT progress_8_score
FROM marts.fact_ks4_performance
WHERE urn = s.urn
ORDER BY year DESC
LIMIT 1
) ks4 ON true
) p ON true
"""
@@ -136,30 +126,23 @@ def sync(typesense_url: str, api_key: str):
schema = {**COLLECTION_SCHEMA, "name": collection_name}
client.collections.create(schema)
# Fetch data from marts — dynamically include KS2/KS4 joins if tables exist
# Fetch data from marts — join fact_performance if it exists
conn = get_db_connection()
with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
# Check which fact tables exist
# Check whether the merged fact table exists
cur.execute("""
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'marts' AND table_name IN ('fact_ks2_performance', 'fact_ks4_performance')
WHERE table_schema = 'marts' AND table_name = 'fact_performance'
""")
existing_tables = {r["table_name"] for r in cur.fetchall()}
select_extra = []
joins = ""
if "fact_ks2_performance" in existing_tables:
select_extra.append("ks2.rwm_expected_pct")
joins += QUERY_KS2_JOIN
if "fact_ks4_performance" in existing_tables:
select_extra.append("ks4.progress_8_score")
joins += QUERY_KS4_JOIN
has_fact_performance = cur.fetchone() is not None
query = QUERY_BASE
if select_extra:
# Insert extra select columns before FROM
query = query.replace("l.longitude as lng", "l.longitude as lng,\n " + ",\n ".join(select_extra))
query += joins
if has_fact_performance:
query = query.replace(
"l.longitude as lng",
"l.longitude as lng,\n p.rwm_expected_pct,\n p.progress_8_score",
)
query += QUERY_PERFORMANCE_JOIN
cur.execute(query)
rows = cur.fetchall()