markowitz.data_providers¶
markowitz.data_providers
¶
External market-data providers and the S&P 500 point-in-time universe.
This package layers a Polygon.io REST client and a thin yfinance adapter
behind a uniform interface, exposing a :func:make_provider factory and a
:class:SP500UniverseBuilder that produces survivorship-bias-aware membership
snapshots. It is independent of the legacy markowitz.data package — that
module still owns the on-disk Parquet cache and Fama-French utilities; this
one adds remote OHLCV ingestion and universe construction needed for
walk-forward research on a realistic equity universe.
PolygonAuthError
¶
Bases: PolygonError
Raised on HTTP 401/403 — missing or invalid API key.
PolygonDataError
¶
Bases: PolygonError
Raised when the response payload is missing expected fields or is empty.
PolygonError
¶
Bases: Exception
Base class for every Polygon-originated failure.
PolygonProvider(api_key: str | None = None, session: httpx.Client | None = None, rpm: int = _STARTER_RPM)
¶
Polygon.io REST adapter with point-in-time accuracy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str | None
|
Polygon API key. If |
None
|
session
|
Client | None
|
Optional pre-built :class: |
None
|
rpm
|
int
|
Requests-per-minute ceiling for the token bucket. Defaults to the Polygon Starter tier limit of 100. |
_STARTER_RPM
|
Source code in src/markowitz/data_providers/polygon.py
get_eod(ticker: str, start: date, end: date) -> pd.DataFrame
¶
Return daily OHLCV for ticker in the inclusive window [start, end].
Output is TitleCase (Open/High/Low/Close/Volume) with a tz-naive
:class:~pandas.DatetimeIndex named Date. Close is split- and
dividend-adjusted (Polygon adjusted=true).
Source code in src/markowitz/data_providers/polygon.py
get_grouped_daily(date_: date) -> pd.DataFrame
¶
Grouped-daily snapshot of every actively-traded US stock on date_.
Index is the ticker symbol; columns are TitleCase OHLCV. Used by the S&P 500 universe builder to know which symbols actually traded on a given historical date.
Source code in src/markowitz/data_providers/polygon.py
get_ticker_meta(ticker: str) -> dict[str, Any]
¶
Return the /v3/reference/tickers/{ticker} payload as a dict.
Source code in src/markowitz/data_providers/polygon.py
PolygonRateLimitError
¶
Bases: PolygonError
Raised after exhausting retries on HTTP 429.
SP500UniverseBuilder(provider: PolygonProvider | YFinanceProvider | None = None)
¶
Builds and caches point-in-time S&P 500 membership snapshots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider
|
PolygonProvider | YFinanceProvider | None
|
Either a :class: |
None
|
Source code in src/markowitz/data_providers/sp500_universe.py
get_membership_as_of(date_: date) -> list[str]
¶
Return the approximated S&P 500 membership on date_.
When the configured provider exposes a working get_grouped_daily
(Polygon path), the result is the intersection of :data:CURRENT_SP500
with the symbols that actually traded on date_. When it does not
(no provider, yfinance fallback, or grouped-daily empty), the static
list is returned and a warning is emitted on the first such call.
Source code in src/markowitz/data_providers/sp500_universe.py
get_membership_window(start: date, end: date, freq: str = 'ME') -> dict[date, list[str]]
¶
Build membership at each rebalance date in [start, end].
freq follows pandas offset aliases; default ME = month-end,
matching the cadence used by most monthly walk-forward backtests.
When [start, end] is shorter than one period the window degenerates
to {start, end} so callers always get at least two anchors back.
Source code in src/markowitz/data_providers/sp500_universe.py
YFinanceProvider(inner: Any = None)
¶
yfinance-backed provider matching :class:PolygonProvider's surface.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inner
|
Any
|
Optional pre-built provider exposing |
None
|
Source code in src/markowitz/data_providers/yfinance.py
get_eod(ticker: str, start: date, end: date) -> pd.DataFrame
¶
Return daily OHLCV via yfinance, normalized to TitleCase columns.
yfinance returns Open/High/Low/Close/Volume natively when called
through yf.download, but the legacy adapter in this repo collapses
the frame down to a single close column. We reconstruct the full
OHLCV view by re-querying yfinance directly when available, falling
back to a close-only frame (Open/High/Low filled with NaN, Volume with
0) when the inner provider doesn't expose it.
Source code in src/markowitz/data_providers/yfinance.py
make_provider(api_key: str | None = None, *, inner_yfinance: Any = None) -> PolygonProvider | YFinanceProvider
¶
Return a Polygon provider if a key is configured, otherwise yfinance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str | None
|
Explicit Polygon API key. When |
None
|
inner_yfinance
|
Any
|
Optional pre-built provider passed straight through to
:class: |
None
|