fix: keep Put/Call positions distinct in 13F-HR holdings (#824)#828
fix: keep Put/Call positions distinct in 13F-HR holdings (#824)#8280ywfe wants to merge 1 commit into
Conversation
ThirteenF.holdings grouped by Cusip alone, so Put/Call rows merged into the same security's equity row and the PutCall column was lost on the merged result. Categories used uppercase 'PUT'/'CALL' while SEC XML emits title-case 'Put'/'Call', so the categorical conversion silently dropped those values too. Group by ['Cusip', 'PutCall'] when the column exists, drop PutCall from agg_dict (it's now a grouping key), and fix the category labels to match SEC XML. Regression test asserts SG Capital Management 13F-HR/A (accession 0001172661-24-002551) keeps its 3 Put positions distinct in the aggregated view, matching the existing infotable-level test.
fbe446c to
8670056
Compare
|
test-network failure was real, not a flake — two existing tests in State Street's 13F has both equity and option positions on the same CUSIPs (e.g. Amended into the existing commit (force-pushed):
All 11 tests now pass locally (1.4s): the 3 new regression tests + 7 existing aggregation tests + the existing |
dgunning
left a comment
There was a problem hiding this comment.
Nice clean fix and the regression tests are well-scoped (real fixture, specific value assertions, infotable baseline guard). CI's green, the three root causes are correctly diagnosed, and the test updates to test_holdings_aggregation_math_correct / test_holdings_preserves_all_securities properly track the new (CUSIP, PutCall) key.
Two requests before merging:
1. Add the column-order preservation from dima's proposed fix. The issue body included this and it's missing here:
if 'PutCall' in holdings.columns:
cols = [c for c in holdings.columns if c != 'PutCall']
insert_after = 'Ticker' if 'Ticker' in cols else (id_cols[-1] if id_cols[-1] in cols else cols[-1])
idx = cols.index(insert_after) + 1
cols.insert(idx, 'PutCall')
holdings = holdings[cols]Without it, pandas' groupby places PutCall right after Cusip (as a group key), shifting it from its prior position. That's an undocumented behavior change on top of the correctness fix — affects positional column access, table rendering order, and any notebook code that hardcodes column indices.
2. Credit dima in the commit message. The technical diff in this PR is theirs — they wrote it in the issue body. A Co-Authored-By: dima <email> trailer (or even just a "Fix proposed by @dima in #824" line) would be the right call.
Behavior-change note for the v5.32.0 release notes (no action needed from you, just flagging for the maintainer): .holdings can now contain multiple rows per CUSIP (equity + Put + Call). Anyone doing .holdings.set_index('Cusip') will hit a duplicate-index error. This is the correct semantics — option positions are distinct securities — but it's an API contract change worth surfacing.
Summary
Fixes #824.
ThirteenF.holdingswas merging Put/Call positions into equity rows for the same CUSIP, then losing the PutCall value entirely because the categorical conversion used uppercase labels that didn't match the SEC XML format.Root cause
Three issues in
edgar/thirteenf/models.pyaggregation logic (around line 454):df.groupby('Cusip', as_index=False)collapsed Put/Call into the same-CUSIP equity rowagg_dict['PutCall'] = 'first'preserved only one value when multiple Put/Call/equity rows existedpd.Categorical(..., categories=['', 'PUT', 'CALL'])used uppercase while SEC XML emits title-case'Put'/'Call'— categorical conversion silently dropped values that didn't matchFix
['Cusip', 'PutCall']whenPutCallis present, so option positions stay separate from equity for the same securityPutCallinagg_dict(it's now a grouping key)['', 'Put', 'Call']categories to match SEC XMLVerification
SG Capital Management LLC 13F-HR/A (accession
0001172661-24-002551) is the existing test fixture used bytests/test_thirteenf.py::test_thirteenf_put_call. Its raw infotable contains 3 Put positions.Before fix: those 3 Puts were merged into same-CUSIP equity rows; the
PutCall == 'Put'query on.holdingsreturned 0 rows.After fix:
.holdings.query(\"PutCall == 'Put'\")returns 3 rows with distinct CUSIPs.Test plan
tests/issues/regression/test_issue_824_13f_putcall_aggregation.py(3 cases,@pytest.mark.network):test_thirteenf_put_callground truth)test_thirteenf_put_callstill passestest_thirteenf.py+test_thirteenf_funds.py+test_thirteenf_rendering.pypass