markowitz.data_providers.sp500_universe¶
markowitz.data_providers.sp500_universe
¶
Survivorship-bias-aware S&P 500 membership builder.
A truly bias-free constituent history would require a paid index-rebalance
feed (S&P Dow Jones Indices, CRSP). We approximate it instead by intersecting
a recent snapshot of the index members with the Polygon grouped-daily snapshot
on each as-of date: a ticker counts as a member iff it appears in
:data:CURRENT_SP500 AND has a real trading bar on the requested day.
This is materially better than the naive "today's-list on yesterday's-date" approach because:
- Symbols that had not yet IPO'd by the as-of date drop out (no grouped-daily row), which prevents look-ahead leakage from the modern constituent list.
- Every returned ticker is guaranteed to have same-day OHLCV available, which is the dominant correctness concern in walk-forward backtests.
Known limitations
- Companies that were once in the index but have since been delisted or acquired (Lehman, EMC, Sprint, ...) are missing. That is the pure "survivor" blind spot and biases backtests upward on average.
- Modern names added to the index after they had been trading publicly (e.g. mid-2010s tech IPOs) are over-included before their real entry date.
When no Polygon provider is supplied the builder emits a warning and returns
the static :data:CURRENT_SP500 list as-is — that path is explicitly
survivorship-biased and should only be used for offline demos.
Caching
Membership lists are cached in-memory keyed by the as-of date. There is no DB write — long-running services that need persistence should layer their own store on top.
SP500UniverseBuilder(provider: PolygonProvider | YFinanceProvider | None = None)
¶
Builds and caches point-in-time S&P 500 membership snapshots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider
|
PolygonProvider | YFinanceProvider | None
|
Either a :class: |
None
|
Source code in src/markowitz/data_providers/sp500_universe.py
get_membership_as_of(date_: date) -> list[str]
¶
Return the approximated S&P 500 membership on date_.
When the configured provider exposes a working get_grouped_daily
(Polygon path), the result is the intersection of :data:CURRENT_SP500
with the symbols that actually traded on date_. When it does not
(no provider, yfinance fallback, or grouped-daily empty), the static
list is returned and a warning is emitted on the first such call.
Source code in src/markowitz/data_providers/sp500_universe.py
get_membership_window(start: date, end: date, freq: str = 'ME') -> dict[date, list[str]]
¶
Build membership at each rebalance date in [start, end].
freq follows pandas offset aliases; default ME = month-end,
matching the cadence used by most monthly walk-forward backtests.
When [start, end] is shorter than one period the window degenerates
to {start, end} so callers always get at least two anchors back.