feat: wire Typesense search into backend, fix sync performance data bug
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 1m1s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m7s
Build and Push Docker Images / Build Integrator (push) Successful in 55s
Build and Push Docker Images / Build Kestra Init (push) Successful in 31s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m25s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s

sync_typesense.py:
- Fix query string replacement: was matching 'ST_X(l.geom) as lng' but
  QUERY_BASE uses 'l.longitude as lng' — KS2/KS4 lateral joins were
  silently dropped on every sync run

backend:
- Add typesense_url/typesense_api_key settings to config.py
- Add search_schools_typesense() to data_loader.py — queries Typesense
  'schools' alias, returns URNs in relevance order with typo tolerance;
  falls back to empty list if Typesense is unavailable
- /api/schools: replace pandas str.contains with Typesense search;
  results are filtered from the DataFrame and returned in relevance order;
  graceful fallback to substring match if Typesense is down

requirements.txt: add typesense==0.21.0, numpy==1.26.4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-27 13:23:32 +00:00
parent 5d8b319451
commit 4b02ab3d8a
5 changed files with 62 additions and 10 deletions

View File

@@ -26,6 +26,7 @@ from .data_loader import (
load_school_data,
geocode_single_postcode,
get_supplementary_data,
search_schools_typesense,
)
from .data_loader import get_data_info as get_db_info
from .schemas import METRIC_DEFINITIONS, RANKING_COLUMNS, SCHOOL_COLUMNS
@@ -314,15 +315,19 @@ async def get_schools(
# Apply filters
if search:
search_lower = search.lower()
mask = (
schools_df["school_name"].str.lower().str.contains(search_lower, na=False)
)
if "address" in schools_df.columns:
mask = mask | schools_df["address"].str.lower().str.contains(
search_lower, na=False
)
schools_df = schools_df[mask]
ts_urns = search_schools_typesense(search)
if ts_urns:
urn_order = {urn: i for i, urn in enumerate(ts_urns)}
schools_df = schools_df[schools_df["urn"].isin(set(ts_urns))].copy()
schools_df["_ts_rank"] = schools_df["urn"].map(urn_order)
schools_df = schools_df.sort_values("_ts_rank").drop(columns=["_ts_rank"])
else:
# Fallback: Typesense unavailable, use substring match
search_lower = search.lower()
mask = schools_df["school_name"].str.lower().str.contains(search_lower, na=False)
if "address" in schools_df.columns:
mask = mask | schools_df["address"].str.lower().str.contains(search_lower, na=False)
schools_df = schools_df[mask]
if local_authority:
schools_df = schools_df[