fix(taps): align with integrator resilience patterns
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m7s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
All checks were successful
Build and Push Docker Images / Build Backend (FastAPI) (push) Successful in 32s
Build and Push Docker Images / Build Frontend (Next.js) (push) Successful in 1m5s
Build and Push Docker Images / Build Integrator (push) Successful in 56s
Build and Push Docker Images / Build Kestra Init (push) Successful in 32s
Build and Push Docker Images / Build Pipeline (Meltano + dbt + Airflow) (push) Successful in 1m7s
Build and Push Docker Images / Trigger Portainer Update (push) Successful in 1s
Port critical patterns from the working integrator into Singer taps: - GIAS: add 404 fallback to yesterday's date, increase timeout to 300s, use latin-1 encoding, use dated URL for links (static URL returns 500) - FBIT: add GIAS date fallback, increase timeout, fix encoding to latin-1 - IDACI: use dated GIAS URL with fallback instead of undated static URL, fix encoding to latin-1, increase timeout to 300s - Ofsted: try utf-8-sig then fall back to latin-1 encoding Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -103,17 +103,29 @@ class IDACIStream(Stream):
|
||||
|
||||
def _get_school_postcodes(self) -> list[tuple[int, str]]:
|
||||
"""Fetch URN + postcode pairs from GIAS."""
|
||||
url = (
|
||||
from datetime import date, timedelta
|
||||
|
||||
today = date.today()
|
||||
base = (
|
||||
"https://ea-edubase-api-prod.azurewebsites.net"
|
||||
"/edubase/downloads/public/edubasealldata.csv"
|
||||
"/edubase/downloads/public/edubasealldata{date}.csv"
|
||||
)
|
||||
url = base.format(date=today.strftime("%Y%m%d"))
|
||||
self.logger.info("Fetching school postcodes from GIAS...")
|
||||
resp = requests.get(url, timeout=120)
|
||||
resp = requests.get(url, timeout=300)
|
||||
|
||||
# Fall back to yesterday if today's file isn't available
|
||||
if resp.status_code == 404:
|
||||
yesterday = (today - timedelta(days=1)).strftime("%Y%m%d")
|
||||
url = base.format(date=yesterday)
|
||||
self.logger.info("Today's GIAS file not available, trying yesterday: %s", url)
|
||||
resp = requests.get(url, timeout=300)
|
||||
|
||||
resp.raise_for_status()
|
||||
|
||||
df = pd.read_csv(
|
||||
io.StringIO(resp.text),
|
||||
encoding="utf-8-sig",
|
||||
encoding="latin-1",
|
||||
usecols=["URN", "Postcode", "EstablishmentStatus (name)"],
|
||||
dtype=str,
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user