The dedupe contract
Across the source monorepo we found 17 implementations of common patterns scattered across 3-6 repos each. Each is consolidated into exactly one canonical implementation before extraction. CI enforces uniqueness.
This page documents the full table, the consolidation procedure, and the CI gate.
The 17 patterns
| # | Pattern | Old copies | Old locations | Canonical home | LoC saved |
|---|---|---|---|---|---|
| 1 | HRP (López de Prado 2016) | 4 | gpuBT1060/portfolio/hrp.py, research/modules/allocators/hrp.py, research/modules/allocators/advanced.py, research/portfolio_attribution/allocator_extensions.py | mts1b-quantkit | ~200 |
| 2 | Black-Litterman | 3+ | platform/allocator/, research/modules/allocators/, research/portfolio_attribution/ | mts1b-quantkit | ~150 |
| 3 | Walk-forward / purged CV | 3 | gpuBT1060/eval/statistical.py::walk_forward(), webui/scripts/backtest_replay.py::cmd_walk_forward(), research/modules/backtests/engines/ladder.py | mts1b-quantkit | ~150 |
| 4 | Calmar / MaxDD / Sharpe variants | 4 | gpuBT1060/kernels/metrics.py, gpuBT1060/kernels/metrics_extended.py, gpuBT1060/lifecycle/shadow.py::_max_dd(), gpuBT1060/analytics/bootstrap.py::max_drawdown() | mts1b-quantkit | ~100 |
| 5 | Kelly / VolTarget sizing | 5+ | gpuBT1060/portfolio/long_short.py, treasury/factory/sweep.py, treasury/live/coinbase_allocator.py, research/portfolio_attribution/multi_sleeve_allocator.py, research/strategies.py | mts1b-portfolio | ~250 |
| 6 | Telegram / Slack dispatch | 6 | operations/main.py, research/api/marketing_views.py, research/ops/altdata_health.py, trading/api/orders.py, trading/modules/risk/untradeable.py, treasury/paper/alerts.py | mts1b-platform/messaging | ~200 |
| 7 | Fee / slippage / commission models | 4+ | gpuBT1060/core/cost_model.py, gpuBT1060/execution/cost_calibration.py, gpuBT1060/core/venue_costs.py (471 LoC!), trading/workers/broker_fill_normalizer.py | mts1b-quantkit/cost_models | ~150 |
| 8 | Calendar / holiday / session | 5+ | treasury/data_sources/global_calendar.py, treasury/data_sources/calendar_coverage.py, gpuBT1060/factors/calendar_effects.py, research/api/sessions_views.py, research/portfolio/session_risk_multipliers.py | mts1b-platform/calendars | ~200 |
| 9 | Symbol normalization | 3+ | research/modules/molly/api.py::_normalize_quote(), tv-bridge/broker.py, trading/workers/broker_exit_reconciler.py | mts1b-platform/symbology | ~80 |
| 10 | ADV / liquidity filters | 4+ | research/bin/find_alpha.py, gpuBT1060/core/venue_costs.py, gpuBT1060/portfolio/walkforward.py, gpuBT1060/execution/order_intents.py | mts1b-quantkit/universe_filters | ~120 |
| 11 | Logging setup | many | scattered logging.getLogger(__name__) + handler wiring | mts1b-platform/logging | ~200 |
| 12 | Config loading (Pydantic Settings) | many | scattered BaseSettings + .env loaders + Vault helpers | mts1b-platform/config | ~150 |
| 13 | DB connection pools | many | scattered psycopg_pool.AsyncConnectionPool + DuckDB wrappers | mts1b-platform/db | ~200 |
| 14 | Retry / backoff (tenacity) | many | scattered @retry decorators + manual exponential backoff | mts1b-platform/retry | ~80 |
| 15 | Rate limiters | many | scattered aiolimiter + custom token-bucket impls | mts1b-platform/ratelimit | ~120 |
| 16 | HTTP client factory | many | scattered httpx.Client / aiohttp.ClientSession factories | mts1b-platform/http | ~150 |
| 17 | Secrets redaction | scattered | scattered log filters | mts1b-platform/security/redact | ~80 |
Total LoC saved: ~2,580. And every drift between copies is a latent bug we get to retire.
Why this matters
Drift between copies isn't theoretical:
- The HRP impl in
gpuBT1060/portfolio/hrp.pyreturns weights viaHRPResult(weights, linkage, leaves); the one inresearch/modules/allocators/hrp.pyreturns a baredict. A consumer that swapped between them silently got different return shapes. - One Sharpe impl annualized assuming 252 trading days; another assumed 365. Strategies looked +20% better on one path.
- Telegram dispatch in 6 places meant 6 places to update when we changed escalation rules — and one was always missed.
Consolidation forces a single source of truth and eliminates an entire class of bugs.
The consolidation procedure
For each of the 17 patterns, in W0 (pre-extraction):
- Identify the canonical implementation — richest tests, cleanest API, most type-safe. Often the GPU/CUDA variant for compute-heavy patterns (it has stricter shape contracts).
- Move to target repo — adapt to
mts1b-foundationtypes if needed. - Update all call sites via codemod — never manual; the codemod is auditable in the PR.
- Delete the old copies — verify via
git grepthat no references remain. - CI enforces uniqueness — AST-based scan (see below) fails if a forbidden symbol is defined in any repo other than its canonical home.
The CI gate
"""Enforce the dedupe contract. Runs in every repo's CI."""
import ast
import pathlib
from typing import Iterator
# Protected symbols: name → canonical home repo
PROTECTED: dict[str, str] = {
"hrp_weights": "mts1b-quantkit",
"black_litterman": "mts1b-quantkit",
"walk_forward": "mts1b-quantkit",
"sharpe_from_moments": "mts1b-quantkit",
"calmar": "mts1b-quantkit",
"max_drawdown": "mts1b-quantkit",
"kelly_fraction": "mts1b-portfolio",
"vol_target_weights": "mts1b-portfolio",
"send_telegram": "mts1b-platform",
"send_slack": "mts1b-platform",
"compute_fee": "mts1b-quantkit",
"compute_slippage": "mts1b-quantkit",
"is_trading_day": "mts1b-platform",
"next_session_close": "mts1b-platform",
"normalize_symbol": "mts1b-platform",
"passes_adv_filter": "mts1b-quantkit",
"get_logger": "mts1b-platform",
"load_config": "mts1b-platform",
"get_db_pool": "mts1b-platform",
"with_retry": "mts1b-platform",
"RateLimiter": "mts1b-platform",
"http_client": "mts1b-platform",
"redact": "mts1b-platform",
}
def find_def(symbol: str, path: pathlib.Path) -> Iterator[pathlib.Path]:
"""Yield every .py file under path that defines symbol at module level."""
for p in path.rglob("*.py"):
try:
tree = ast.parse(p.read_text())
except (SyntaxError, UnicodeDecodeError):
continue
for node in ast.walk(tree):
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
if node.name == symbol:
yield p
break
def test_no_duplicate_implementations():
repo_root = pathlib.Path(__file__).resolve().parents[2]
this_repo = repo_root.name # e.g. "mts1b-quantkit"
for symbol, canonical_home in PROTECTED.items():
hits = list(find_def(symbol, repo_root / "src"))
if this_repo == canonical_home:
assert len(hits) == 1, (
f"{symbol} should be defined exactly once in its canonical home "
f"{canonical_home}, found {len(hits)} in {this_repo}: {hits}"
)
else:
assert len(hits) == 0, (
f"{symbol} is defined in {this_repo}, but its canonical home is "
f"{canonical_home}. Move it back, OR rename if it's a different concept. "
f"Hits: {hits}"
)
Why AST and not grep:
grep "def hrp_weights"matches comments + docstrings + similar names — many false positives.- AST walks definitions only. Zero false positives.
- AST also handles
class HRP(...)properly, since the file may have a class wrapping the function.
Pluggability without violating
You CAN still write a custom HRP in your own repo — just don't name it hrp_weights:
# In your-strategy/src/your_strategy/custom_alloc.py
from mts1b_quantkit.allocators import hrp_weights # canonical
from mts1b_quantkit.shrinkage import ledoit_wolf
def hrp_with_shrinkage(returns):
"""Custom variant: HRP on Ledoit-Wolf-shrunk cov matrix."""
shrunk = ledoit_wolf(returns.cov())
# Use the canonical impl, just pass different data
return hrp_weights(shrunk_returns)
The CI cares about the name, not the algorithm — because the name is the contract.
Why "exactly once" and not "approved fork"
Approved-fork ("you can fork it if you ask") sounds appealing but breaks down:
- It creates a maintenance tax (every fork needs upstream review).
- It encourages "just one quick local change" which never gets contributed back.
- It defeats the readability win (the reader still has to ask "which HRP is this?").
Better: the canonical impl has extension points (custom distance metric, custom cluster algorithm). If you need a knob the canonical doesn't expose, open a PR adding it.
Status
- W0: ✅ 17 patterns identified, this contract written.
- W0 in progress: per-pattern audit + canonical-impl move + codemod for call sites + CI gate.
- v1 launch (month 3): all 12 v1 repos pass the CI gate.