{ "cells": [ { "cell_type": "markdown", "id": "intro", "metadata": {}, "source": "# ForecastWatch API — Chronic Bold-Wrong Stations\n\nFind the stations where your 1-day-out forecast keeps landing in the **bold_wrong** quadrant — i.e., your forecast diverged from market consensus *and* was less accurate. These are the locations where the model is most likely systematically miscalibrated, and where investigating tends to pay off.\n\nThis notebook uses three endpoints:\n\n- `GET /v1/insights/daily/dates/` — discover which dates have daily scoring data (~50-day rolling window).\n- `GET /v1/insights/daily/?quadrant=bold_wrong` — get the bold_wrong stations for one day, server-side filtered.\n- `GET /v1/insights/station-trend/` — pull a 30–90 day skill-ratio trend for one station.\n\n**Auth:** every request needs `Authorization: Bearer fw_...`. Generate a key at https://app3.forecastwatch.com → Tools → API Portal.\n\n**To run this in Google Colab:** click the key icon in the left sidebar, add a Secret named `FORECASTWATCH_API_KEY` with your `fw_...` key as the value, then run all cells. The notebook will pick it up automatically.\n\n**To run locally in Jupyter:** either export `FORECASTWATCH_API_KEY=fw_yourKey` in your shell before launching Jupyter, or replace the placeholder string in the setup cell with your key." }, { "cell_type": "code", "id": "setup", "execution_count": null, "metadata": {}, "outputs": [], "source": "import os\nimport requests # standard Python HTTP client\nimport pandas as pd # tabular data, built on numpy. pd.DataFrame is the workhorse.\nimport matplotlib.pyplot as plt # plotting. plt is the universal alias.\n\nBASE_URL = \"https://api.forecastwatch.com\"\n\n# Resolve the API key from (in order): Colab Secrets, the\n# FORECASTWATCH_API_KEY env var, or the placeholder below.\ntry:\n from google.colab import userdata\n API_KEY = userdata.get(\"FORECASTWATCH_API_KEY\")\nexcept (ImportError, ModuleNotFoundError):\n API_KEY = os.environ.get(\"FORECASTWATCH_API_KEY\", \"fw_yourKeyHere\")\n\nHEADERS = {\"Authorization\": f\"Bearer {API_KEY}\"}" }, { "cell_type": "markdown", "id": "hdr-auth", "metadata": {}, "source": "## 1. Auth check\n\nHit `/v1/auth/me/` to confirm the key works and see which provider the API will answer for. Every insights call uses this provider implicitly — you never pass it as a parameter." }, { "cell_type": "code", "id": "auth-check", "execution_count": null, "metadata": {}, "outputs": [], "source": "me = requests.get(f\"{BASE_URL}/v1/auth/me/\", headers=HEADERS).json()\nprint(f\"User: {me['user']['username']}\")\nprint(f\"Account: {me['account']['name']}\")\nprint(f\"Provider: {me['active_provider']['name']}\")" }, { "cell_type": "markdown", "id": "hdr-dates", "metadata": {}, "source": "## 2. Discover the dates we can analyze\n\nThe daily insights endpoint exposes a rolling ~50-day window. Pick how many of those you want to look at — 14 keeps the demo snappy; bump it up to 30 or 50 for a fuller analysis." }, { "cell_type": "code", "id": "dates", "execution_count": null, "metadata": {}, "outputs": [], "source": "N_DAYS = 14\n\nresp = requests.get(\n f\"{BASE_URL}/v1/insights/daily/dates/\",\n headers=HEADERS,\n params={\"metric\": \"high24\"},\n).json()\n\ndates = resp[\"dates\"][:N_DAYS]\nprint(f\"Analyzing {len(dates)} most recent dates: {dates[-1]} → {dates[0]}\")" }, { "cell_type": "markdown", "id": "hdr-fetch", "metadata": {}, "source": "## 3. Fetch bold_wrong stations for each date\n\nOne API call per date. The `quadrant=bold_wrong` query parameter does the filtering server-side, so each response only contains the stations we care about.\n\n*(For ~14 dates this is a few seconds. If you want it faster, swap the loop for `httpx.AsyncClient` + `asyncio.gather` and run the requests in parallel.)*" }, { "cell_type": "code", "id": "fetch", "execution_count": null, "metadata": {}, "outputs": [], "source": "frames = []\nfor date in dates:\n payload = requests.get(\n f\"{BASE_URL}/v1/insights/daily/\",\n headers=HEADERS,\n params={\n \"date\": date,\n \"days_out\": 1,\n \"metric\": \"high24\",\n \"quadrant\": \"bold_wrong\",\n },\n ).json()\n if payload.get(\"stations\"):\n # pd.DataFrame(list_of_dicts) builds a table where each dict\n # becomes a row and dict keys become column names.\n df = pd.DataFrame(payload[\"stations\"])\n # .assign() adds a column without mutating the original DataFrame.\n df = df.assign(date=date)\n frames.append(df)\n\n# pd.concat stacks the per-day DataFrames vertically into one.\nall_bold_wrong = pd.concat(frames, ignore_index=True)\nprint(f\"Collected {len(all_bold_wrong)} bold_wrong station-days across {len(dates)} dates.\")\nall_bold_wrong.head()" }, { "cell_type": "markdown", "id": "aside-cols", "metadata": {}, "source": "### Aside: the columns we'll use\n\nEach row in `all_bold_wrong` is a (station, date) pair the API flagged as bold_wrong. The columns we care about for the analysis:\n\n- `station_id`, `station_name`: which station\n- `date`: the day the row is for\n- `divergence_z`: how many standard deviations your forecast diverged from market consensus. Sign is direction; absolute value is how dramatic.\n- `accuracy_z`: how many standard deviations your error was vs. market error. Positive = you did worse.\n- `your_signed_diff`: your forecast minus actual. Positive = you predicted too high.\n\nThe other columns (lat, lng, country, etc.) are there for filtering or mapping. We won't touch them today." }, { "cell_type": "markdown", "id": "hdr-count", "metadata": {}, "source": "## 4. Count appearances per station\n\nGroup by station, count how many days each one landed in bold_wrong, and compute the average severity (`avg_abs_divergence_z`). Sort descending by count so the most-frequent appear first." }, { "cell_type": "code", "id": "count", "execution_count": null, "metadata": {}, "outputs": [], "source": "# The \"method-chained DataFrame\" pattern: each .x() returns a new\n# DataFrame so we can pipeline transforms left-to-right without\n# intermediate variables.\nchronic = (\n all_bold_wrong\n # .assign() adds a derived column. .abs() is element-wise absolute value.\n .assign(abs_div=lambda d: d[\"divergence_z\"].abs())\n # groupby + agg is pandas' core \"summarize by category\" pattern.\n # as_index=False keeps the grouping keys as regular columns.\n .groupby([\"station_id\", \"station_name\"], as_index=False)\n .agg(\n bold_wrong_days=(\"date\", \"count\"),\n avg_abs_divergence_z=(\"abs_div\", \"mean\"),\n )\n .sort_values(\"bold_wrong_days\", ascending=False)\n)\nchronic.head(20)" }, { "cell_type": "markdown", "id": "hdr-hist", "metadata": {}, "source": "## 5. How chronic is \"chronic\"?\n\nMost stations land in bold_wrong occasionally. A smaller group appears chronically, day after day. Before picking a station to drill into, let's see the shape of the distribution." }, { "cell_type": "code", "id": "hist", "execution_count": null, "metadata": {}, "outputs": [], "source": "# .value_counts() tallies how many stations have each bold_wrong_days value.\n# .sort_index() orders the bars on the x-axis numerically.\nhist = chronic[\"bold_wrong_days\"].value_counts().sort_index()\n\n# Series.plot.bar() is a one-line matplotlib bar chart from a pandas Series.\nax = hist.plot.bar(figsize=(10, 3), color=\"#1f77b4\")\nax.set_xlabel(\"days station appeared in bold_wrong\")\nax.set_ylabel(\"number of stations\")\nax.set_title(f\"Distribution of bold_wrong appearances ({N_DAYS}-day window)\")\nplt.tight_layout() # auto-adjust margins so labels don't clip\nplt.show()\n\nprint(f\"Stations with just 1 bold_wrong day: {int(hist.get(1, 0))}\")\nprint(f\"Stations with {N_DAYS} bold_wrong days: {int(hist.get(N_DAYS, 0))} (always-chronic offenders)\")" }, { "cell_type": "markdown", "id": "hdr-severity", "metadata": {}, "source": "## 6. Among chronic stations, who's most severely wrong?\n\nFrequency alone doesn't tell the whole story. A station can land in bold_wrong daily but only slightly off. The stations worth investigating are the ones that are both **chronic and severely off**.\n\nThe plan: filter to the frequently-chronic tail (top 25% by `bold_wrong_days`), then re-sort by `avg_abs_divergence_z` (severity). The top of this list is the strongest signal." }, { "cell_type": "code", "id": "severity", "execution_count": null, "metadata": {}, "outputs": [], "source": "# .quantile(0.75) returns the 75th-percentile value. Stations at or above\n# this many bold_wrong days are the \"frequently chronic\" tail.\nfreq_threshold = chronic[\"bold_wrong_days\"].quantile(0.75)\n\n# Filter to the frequently-chronic tail, then re-sort by severity.\n# A station that's chronic AND severely off is the most actionable signal.\nchronic_severe = (\n chronic[chronic[\"bold_wrong_days\"] >= freq_threshold]\n .sort_values(\"avg_abs_divergence_z\", ascending=False)\n .reset_index(drop=True)\n)\nprint(f\"Chronic threshold: at least {int(freq_threshold)} bold_wrong days out of {N_DAYS}\")\nprint(f\"Stations meeting that bar: {len(chronic_severe)}\")\nchronic_severe.head(10)" }, { "cell_type": "markdown", "id": "4cbb30b1", "source": "## 7. Who is this station?\n\nBefore we plot a 90-day trend, let's look up metadata for the top chronic offender. The `/v1/reference/station-metadata/{station_id}/` endpoint returns rich detail for a single station: name, city, country, latitude/longitude, elevation, timezone, and the standard station codes (ICAO, IATA, WMO, SYNOP).\n\nThat context often suggests *why* the model is struggling there. An Alpine valley airport, a tropical coastal station, and a desert plateau station each have different microclimate stories.", "metadata": {} }, { "cell_type": "code", "id": "2c107b22", "source": "# Top of chronic_severe = the station to investigate.\nworst = chronic_severe.iloc[0]\nstation_id = int(worst[\"station_id\"])\n\n# /v1/reference/station-metadata/{station_id}/ returns rich metadata for\n# a single station by ID: name, city, country, lat/lng, elevation,\n# timezone, and the standard station codes (ICAO, IATA, WMO, SYNOP).\nmeta = requests.get(\n f\"{BASE_URL}/v1/reference/station-metadata/{station_id}/\",\n headers=HEADERS,\n).json()\n\nelev_m = meta[\"elevation_cm\"] / 100 if meta.get(\"elevation_cm\") else None\nprint(f\"Station ID: {station_id}\")\nprint(f\" Name: {meta['name']}\")\nprint(f\" City: {meta['display_city']}\")\nprint(f\" Region: {meta['state']}, {meta['country']}\")\nprint(f\" Lat/Lng: {meta['lat']:.3f}, {meta['lng']:.3f}\")\nprint(f\" Elevation: {elev_m:.0f} m\" if elev_m is not None else \" Elevation: (unknown)\")\nprint(f\" ICAO: {meta.get('icao') or '(none)'}\")\nprint(f\" WMO: {meta.get('wmo') or '(none)'}\")\nprint(f\" Timezone: {meta.get('timezone_name')}\")", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "hdr-drill", "metadata": {}, "source": "## 8. Drill into the 90-day trend\n\nNow that we know which station this is, pull the 90-day skill-ratio trend and check the bias direction. Two-panel chart: skill ratio above (how bad), signed bias below (which direction). If the bias is consistently positive, the model runs too high here; consistently negative, too low." }, { "cell_type": "code", "id": "drill", "execution_count": null, "metadata": {}, "outputs": [], "source": "# worst and station_id were computed in the previous cell.\nprint(f\"Drilling into: {meta['name']} ({meta.get('icao') or station_id})\")\nprint(f\" bold_wrong days: {worst['bold_wrong_days']} of {N_DAYS}\")\nprint(f\" avg severity (z): {worst['avg_abs_divergence_z']:.2f}\")\n\npayload = requests.get(\n f\"{BASE_URL}/v1/insights/station-trend/\",\n headers=HEADERS,\n params={\"station_id\": station_id, \"metric\": \"high24\", \"days\": 90},\n).json()\n\n# Build the DataFrame from the trend list. Dates arrive as strings, so we\n# parse them to pandas datetime; this makes the chart's x-axis time-aware.\ntrend_df = pd.DataFrame(payload[\"trend\"])\ntrend_df[\"date\"] = pd.to_datetime(trend_df[\"date\"])\ntrend_df = trend_df.sort_values(\"date\")\n\n# your_signed_diff > 0 means our forecast was higher than actual;\n# < 0 means lower. The average over 90 days reveals systematic bias.\navg_bias = trend_df[\"your_signed_diff\"].mean()\nbias_word = \"too high\" if avg_bias > 0 else \"too low\"\nprint(f\" 90-day average bias: {avg_bias:+.2f} (model runs {bias_word})\")\n\n# Two-panel chart: skill ratio (how bad) above, signed bias (which direction) below.\n# plt.subplots(n, m) returns a Figure and an array of Axes, one per panel.\nfig, axes = plt.subplots(2, 1, figsize=(10, 6), sharex=True)\n\ntrend_df.plot(x=\"date\", y=\"skill_ratio\", ax=axes[0], legend=False)\naxes[0].axhline(1.0, linestyle=\"--\", color=\"gray\")\naxes[0].set_ylabel(\"skill ratio\")\naxes[0].set_title(f\"{meta['name']}: 90-day trend (1.0 = market average)\")\n\ntrend_df.plot(x=\"date\", y=\"your_signed_diff\", ax=axes[1], legend=False, color=\"orange\")\naxes[1].axhline(0, linestyle=\"--\", color=\"gray\")\naxes[1].set_ylabel(\"signed diff\")\naxes[1].set_xlabel(\"date\")\n\nplt.tight_layout()\nplt.show()" }, { "cell_type": "markdown", "id": "whats-next", "metadata": {}, "source": "## What's next\n\nA few directions to take this:\n\n- **Run across all four metrics** (`high24`, `low24`, `highmos`, `lowmos`) and find stations that appear chronic across multiple metrics. Those are the strongest signals.\n- **Compare to bold_right counts.** A station that's frequently both bold_wrong *and* bold_right is high-variance, not miscalibrated. The interesting cases are stations heavily skewed one way.\n- **Schedule this as a weekly report.** Save the `chronic_severe` DataFrame, diff it against last week's, surface new entries.\n- **Parallelize the date loop** with `httpx.AsyncClient` + `asyncio.gather` — drops the runtime from ~7s to ~1s for 14 days, more useful at 30–50 days.\n\nFull endpoint catalog with parameters and examples: `GET /v1/api-docs/`." } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3" } }, "nbformat": 4, "nbformat_minor": 5 }