Skip to content

markowitz.data_providers.polygon

markowitz.data_providers.polygon

Polygon.io REST client for end-of-day OHLCV and reference data.

Public surface is :class:PolygonProvider, which exposes three methods used elsewhere in this package:

  • :meth:get_eod — daily OHLCV for a single ticker over an inclusive window.
  • :meth:get_ticker_meta — the /v3/reference/tickers/{ticker} payload.
  • :meth:get_grouped_daily — every actively-traded US stock on a given date.

The grouped-daily snapshot is what makes the survivorship-bias-aware S&P 500 universe builder possible: it tells us which tickers had a real bar on the historical date, so we never request OHLCV for a symbol that was not yet trading (or had already been delisted).

Adjusted vs raw

Every aggregate request sets adjusted=true so split- and dividend-adjusted prices are returned. This is the right default for total- return research; consumers that need raw prices should construct their own provider variant.

Rate-limit / retry

A sliding-window token bucket caps outbound traffic at ~100 requests per 60 seconds (the Polygon Starter tier ceiling). Each transient failure (HTTP 429 or 5xx, or a low-level network error) triggers exponential backoff with three attempts (1s, 2s, 4s plus jitter). The bucket is process-local; cross-process throttling would need an external store.

PolygonProvider(api_key: str | None = None, session: httpx.Client | None = None, rpm: int = _STARTER_RPM)

Polygon.io REST adapter with point-in-time accuracy.

Parameters:

Name Type Description Default
api_key str | None

Polygon API key. If None, falls back to the POLYGON_API_KEY environment variable. A missing key raises :class:PolygonAuthError immediately so misconfiguration surfaces at construction time rather than on the first network call.

None
session Client | None

Optional pre-built :class:httpx.Client. When None the provider owns the lifecycle and closes the session in :meth:close.

None
rpm int

Requests-per-minute ceiling for the token bucket. Defaults to the Polygon Starter tier limit of 100.

_STARTER_RPM
Source code in src/markowitz/data_providers/polygon.py
def __init__(
    self,
    api_key: str | None = None,
    session: httpx.Client | None = None,
    rpm: int = _STARTER_RPM,
) -> None:
    self._api_key = api_key or os.environ.get("POLYGON_API_KEY", "")
    if not self._api_key:
        raise PolygonAuthError(
            "POLYGON_API_KEY is required to instantiate PolygonProvider"
        )
    self._owns_session = session is None
    self._session = session or httpx.Client(timeout=_HTTP_TIMEOUT)
    self._bucket = _TokenBucket(rpm=rpm)

get_eod(ticker: str, start: date, end: date) -> pd.DataFrame

Return daily OHLCV for ticker in the inclusive window [start, end].

Output is TitleCase (Open/High/Low/Close/Volume) with a tz-naive :class:~pandas.DatetimeIndex named Date. Close is split- and dividend-adjusted (Polygon adjusted=true).

Source code in src/markowitz/data_providers/polygon.py
def get_eod(self, ticker: str, start: date, end: date) -> pd.DataFrame:
    """Return daily OHLCV for ``ticker`` in the inclusive window ``[start, end]``.

    Output is TitleCase (Open/High/Low/Close/Volume) with a tz-naive
    :class:`~pandas.DatetimeIndex` named ``Date``. Close is split- and
    dividend-adjusted (Polygon ``adjusted=true``).
    """
    ticker = ticker.strip().upper()
    if start > end:
        raise ValueError(f"start ({start}) must be <= end ({end})")
    return self._fetch_aggs(ticker, start, end)

get_grouped_daily(date_: date) -> pd.DataFrame

Grouped-daily snapshot of every actively-traded US stock on date_.

Index is the ticker symbol; columns are TitleCase OHLCV. Used by the S&P 500 universe builder to know which symbols actually traded on a given historical date.

Source code in src/markowitz/data_providers/polygon.py
def get_grouped_daily(self, date_: date) -> pd.DataFrame:
    """Grouped-daily snapshot of every actively-traded US stock on ``date_``.

    Index is the ticker symbol; columns are TitleCase OHLCV. Used by the
    S&P 500 universe builder to know which symbols actually traded on a
    given historical date.
    """
    path = f"/v2/aggs/grouped/locale/us/market/stocks/{date_.isoformat()}"
    payload = self._request("GET", path, params={"adjusted": "true"})
    results = payload.get("results") or []
    if not results:
        return pd.DataFrame(columns=list(_OHLCV_COLUMNS))
    rows: dict[str, dict[str, float]] = {}
    for bar in results:
        symbol = bar.get("T")
        if not symbol:
            continue
        rows[symbol] = {
            "Open": float(bar.get("o", 0.0)),
            "High": float(bar.get("h", 0.0)),
            "Low": float(bar.get("l", 0.0)),
            "Close": float(bar.get("c", 0.0)),
            "Volume": float(bar.get("v", 0.0)),
        }
    df = pd.DataFrame.from_dict(rows, orient="index")
    df.index.name = "ticker"
    return cast(pd.DataFrame, df)

get_ticker_meta(ticker: str) -> dict[str, Any]

Return the /v3/reference/tickers/{ticker} payload as a dict.

Source code in src/markowitz/data_providers/polygon.py
def get_ticker_meta(self, ticker: str) -> dict[str, Any]:
    """Return the ``/v3/reference/tickers/{ticker}`` payload as a dict."""
    ticker = ticker.strip().upper()
    payload = self._request("GET", f"/v3/reference/tickers/{ticker}")
    results = payload.get("results")
    if not isinstance(results, dict):
        raise PolygonDataError(f"No reference data returned for {ticker}")
    return results