perf: resolve all P1–P5 performance issues from code review
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 21s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 50s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 12s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 21s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 50s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 12s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
P1 (backend/data_loader.py): Add load_latest_school_data() which pre-computes the one-row-per-school latest-year snapshot (groupby, prev-year trend merge) once at startup instead of on every /api/schools request. get_schools route now starts from the cached snapshot rather than rebuilding it. S3 (backend/app.py): Wrap synchronous geocode_single_postcode() call in asyncio.to_thread() so postcode lookups no longer block the uvicorn event loop. Admin reload endpoint also uses to_thread for both cache primes. P2 (nextjs-app/components/HomeView.tsx): Add mapParamsRef guard so switching back to map view does not re-fetch 500 schools when search params haven't changed. Reset ref on new searches so fresh data is always fetched. P3 (nextjs-app/lib/chartSetup.ts): Extract Chart.js registration into a shared side-effect module. ComparisonChart and PerformanceChart now import it instead of each calling ChartJS.register() independently. P4 (backend/database.py): Remove unnecessary db.commit() from the read-only get_db_session() context manager — saves a DB round-trip on every request. P5 (backend/database.py): Add pool_recycle=1800 to SQLAlchemy engine to prevent stale TCP connections from accumulating in long-running processes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
+11
-33
@@ -24,6 +24,7 @@ from .config import settings
|
||||
from .data_loader import (
|
||||
clear_cache,
|
||||
load_school_data,
|
||||
load_latest_school_data,
|
||||
geocode_single_postcode,
|
||||
get_supplementary_data,
|
||||
search_schools_typesense,
|
||||
@@ -223,6 +224,8 @@ async def lifespan(app: FastAPI):
|
||||
print("Warning: No data in marts. Run the annual EES pipeline to populate KS2 data.")
|
||||
else:
|
||||
print(f"Data loaded successfully: {len(df)} records.")
|
||||
# Pre-compute the latest-year snapshot so the first search request is fast
|
||||
await asyncio.to_thread(load_latest_school_data)
|
||||
try:
|
||||
_sitemap_xml = build_sitemap()
|
||||
n = _sitemap_xml.count("<url>")
|
||||
@@ -321,44 +324,17 @@ async def get_schools(
|
||||
phase = sanitize_search_input(phase)
|
||||
postcode = validate_postcode(postcode)
|
||||
|
||||
df = load_school_data()
|
||||
# Load the pre-computed latest-year snapshot (cached after first request / startup).
|
||||
# This avoids rebuilding the expensive groupby + prev-year merge on every search.
|
||||
df_latest = load_latest_school_data()
|
||||
|
||||
if df.empty:
|
||||
if df_latest.empty:
|
||||
return {"schools": [], "total": 0, "page": page, "page_size": 0}
|
||||
|
||||
# Use configured default if not specified
|
||||
if page_size is None:
|
||||
page_size = settings.default_page_size
|
||||
|
||||
# Schools with no performance data (special schools, PRUs, newly opened, etc.)
|
||||
# have NULL year from the LEFT JOIN — keep them but skip the groupby/trend logic.
|
||||
df_no_perf = df[df["year"].isna()].drop_duplicates(subset=["urn"])
|
||||
df = df[df["year"].notna()]
|
||||
|
||||
# Get unique schools (latest year data for each)
|
||||
latest_year = df.groupby("urn")["year"].max().reset_index()
|
||||
df_latest = df.merge(latest_year, on=["urn", "year"])
|
||||
|
||||
# Calculate trend by comparing to previous year
|
||||
# Get second-latest year for each school
|
||||
df_sorted = df.sort_values(["urn", "year"], ascending=[True, False])
|
||||
df_prev = df_sorted.groupby("urn").nth(1).reset_index()
|
||||
if not df_prev.empty and "rwm_expected_pct" in df_prev.columns:
|
||||
prev_rwm = df_prev[["urn", "rwm_expected_pct"]].rename(
|
||||
columns={"rwm_expected_pct": "prev_rwm_expected_pct"}
|
||||
)
|
||||
if "attainment_8_score" in df_prev.columns:
|
||||
prev_rwm = prev_rwm.merge(
|
||||
df_prev[["urn", "attainment_8_score"]].rename(
|
||||
columns={"attainment_8_score": "prev_attainment_8_score"}
|
||||
),
|
||||
on="urn", how="outer"
|
||||
)
|
||||
df_latest = df_latest.merge(prev_rwm, on="urn", how="left")
|
||||
|
||||
# Merge back schools with no performance data
|
||||
df_latest = pd.concat([df_latest, df_no_perf], ignore_index=True)
|
||||
|
||||
# Phase filter — uses PHASE_GROUPS so all-through/middle schools appear
|
||||
# in the correct phase(s) rather than being invisible to both filters.
|
||||
if phase:
|
||||
@@ -404,7 +380,8 @@ async def get_schools(
|
||||
# Location-based search (uses pre-geocoded data from database)
|
||||
search_coords = None
|
||||
if postcode:
|
||||
coords = geocode_single_postcode(postcode)
|
||||
# Offload the synchronous HTTP call to a thread so the event loop stays free
|
||||
coords = await asyncio.to_thread(geocode_single_postcode, postcode)
|
||||
if coords:
|
||||
search_coords = coords
|
||||
schools_df = schools_df.copy()
|
||||
@@ -907,7 +884,8 @@ async def reload_data(
|
||||
Requires X-API-Key header with valid admin API key.
|
||||
"""
|
||||
clear_cache()
|
||||
load_school_data()
|
||||
await asyncio.to_thread(load_school_data)
|
||||
await asyncio.to_thread(load_latest_school_data)
|
||||
return {"status": "reloaded"}
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user