Skip to main content

The dedupe contract

Across the source monorepo we found 17 implementations of common patterns scattered across 3-6 repos each. Each is consolidated into exactly one canonical implementation before extraction. CI enforces uniqueness.

This page documents the full table, the consolidation procedure, and the CI gate.

The 17 patterns

#PatternOld copiesOld locationsCanonical homeLoC saved
1HRP (López de Prado 2016)4gpuBT1060/portfolio/hrp.py, research/modules/allocators/hrp.py, research/modules/allocators/advanced.py, research/portfolio_attribution/allocator_extensions.pymts1b-quantkit~200
2Black-Litterman3+platform/allocator/, research/modules/allocators/, research/portfolio_attribution/mts1b-quantkit~150
3Walk-forward / purged CV3gpuBT1060/eval/statistical.py::walk_forward(), webui/scripts/backtest_replay.py::cmd_walk_forward(), research/modules/backtests/engines/ladder.pymts1b-quantkit~150
4Calmar / MaxDD / Sharpe variants4gpuBT1060/kernels/metrics.py, gpuBT1060/kernels/metrics_extended.py, gpuBT1060/lifecycle/shadow.py::_max_dd(), gpuBT1060/analytics/bootstrap.py::max_drawdown()mts1b-quantkit~100
5Kelly / VolTarget sizing5+gpuBT1060/portfolio/long_short.py, treasury/factory/sweep.py, treasury/live/coinbase_allocator.py, research/portfolio_attribution/multi_sleeve_allocator.py, research/strategies.pymts1b-portfolio~250
6Telegram / Slack dispatch6operations/main.py, research/api/marketing_views.py, research/ops/altdata_health.py, trading/api/orders.py, trading/modules/risk/untradeable.py, treasury/paper/alerts.pymts1b-platform/messaging~200
7Fee / slippage / commission models4+gpuBT1060/core/cost_model.py, gpuBT1060/execution/cost_calibration.py, gpuBT1060/core/venue_costs.py (471 LoC!), trading/workers/broker_fill_normalizer.pymts1b-quantkit/cost_models~150
8Calendar / holiday / session5+treasury/data_sources/global_calendar.py, treasury/data_sources/calendar_coverage.py, gpuBT1060/factors/calendar_effects.py, research/api/sessions_views.py, research/portfolio/session_risk_multipliers.pymts1b-platform/calendars~200
9Symbol normalization3+research/modules/molly/api.py::_normalize_quote(), tv-bridge/broker.py, trading/workers/broker_exit_reconciler.pymts1b-platform/symbology~80
10ADV / liquidity filters4+research/bin/find_alpha.py, gpuBT1060/core/venue_costs.py, gpuBT1060/portfolio/walkforward.py, gpuBT1060/execution/order_intents.pymts1b-quantkit/universe_filters~120
11Logging setupmanyscattered logging.getLogger(__name__) + handler wiringmts1b-platform/logging~200
12Config loading (Pydantic Settings)manyscattered BaseSettings + .env loaders + Vault helpersmts1b-platform/config~150
13DB connection poolsmanyscattered psycopg_pool.AsyncConnectionPool + DuckDB wrappersmts1b-platform/db~200
14Retry / backoff (tenacity)manyscattered @retry decorators + manual exponential backoffmts1b-platform/retry~80
15Rate limitersmanyscattered aiolimiter + custom token-bucket implsmts1b-platform/ratelimit~120
16HTTP client factorymanyscattered httpx.Client / aiohttp.ClientSession factoriesmts1b-platform/http~150
17Secrets redactionscatteredscattered log filtersmts1b-platform/security/redact~80

Total LoC saved: ~2,580. And every drift between copies is a latent bug we get to retire.

Why this matters

Drift between copies isn't theoretical:

  • The HRP impl in gpuBT1060/portfolio/hrp.py returns weights via HRPResult(weights, linkage, leaves); the one in research/modules/allocators/hrp.py returns a bare dict. A consumer that swapped between them silently got different return shapes.
  • One Sharpe impl annualized assuming 252 trading days; another assumed 365. Strategies looked +20% better on one path.
  • Telegram dispatch in 6 places meant 6 places to update when we changed escalation rules — and one was always missed.

Consolidation forces a single source of truth and eliminates an entire class of bugs.

The consolidation procedure

For each of the 17 patterns, in W0 (pre-extraction):

  1. Identify the canonical implementation — richest tests, cleanest API, most type-safe. Often the GPU/CUDA variant for compute-heavy patterns (it has stricter shape contracts).
  2. Move to target repo — adapt to mts1b-foundation types if needed.
  3. Update all call sites via codemod — never manual; the codemod is auditable in the PR.
  4. Delete the old copies — verify via git grep that no references remain.
  5. CI enforces uniqueness — AST-based scan (see below) fails if a forbidden symbol is defined in any repo other than its canonical home.

The CI gate

tests/contracts/test_no_duplicate_implementations.py
"""Enforce the dedupe contract. Runs in every repo's CI."""

import ast
import pathlib
from typing import Iterator

# Protected symbols: name → canonical home repo
PROTECTED: dict[str, str] = {
"hrp_weights": "mts1b-quantkit",
"black_litterman": "mts1b-quantkit",
"walk_forward": "mts1b-quantkit",
"sharpe_from_moments": "mts1b-quantkit",
"calmar": "mts1b-quantkit",
"max_drawdown": "mts1b-quantkit",
"kelly_fraction": "mts1b-portfolio",
"vol_target_weights": "mts1b-portfolio",
"send_telegram": "mts1b-platform",
"send_slack": "mts1b-platform",
"compute_fee": "mts1b-quantkit",
"compute_slippage": "mts1b-quantkit",
"is_trading_day": "mts1b-platform",
"next_session_close": "mts1b-platform",
"normalize_symbol": "mts1b-platform",
"passes_adv_filter": "mts1b-quantkit",
"get_logger": "mts1b-platform",
"load_config": "mts1b-platform",
"get_db_pool": "mts1b-platform",
"with_retry": "mts1b-platform",
"RateLimiter": "mts1b-platform",
"http_client": "mts1b-platform",
"redact": "mts1b-platform",
}


def find_def(symbol: str, path: pathlib.Path) -> Iterator[pathlib.Path]:
"""Yield every .py file under path that defines symbol at module level."""
for p in path.rglob("*.py"):
try:
tree = ast.parse(p.read_text())
except (SyntaxError, UnicodeDecodeError):
continue
for node in ast.walk(tree):
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
if node.name == symbol:
yield p
break


def test_no_duplicate_implementations():
repo_root = pathlib.Path(__file__).resolve().parents[2]
this_repo = repo_root.name # e.g. "mts1b-quantkit"

for symbol, canonical_home in PROTECTED.items():
hits = list(find_def(symbol, repo_root / "src"))
if this_repo == canonical_home:
assert len(hits) == 1, (
f"{symbol} should be defined exactly once in its canonical home "
f"{canonical_home}, found {len(hits)} in {this_repo}: {hits}"
)
else:
assert len(hits) == 0, (
f"{symbol} is defined in {this_repo}, but its canonical home is "
f"{canonical_home}. Move it back, OR rename if it's a different concept. "
f"Hits: {hits}"
)

Why AST and not grep:

  • grep "def hrp_weights" matches comments + docstrings + similar names — many false positives.
  • AST walks definitions only. Zero false positives.
  • AST also handles class HRP(...) properly, since the file may have a class wrapping the function.

Pluggability without violating

You CAN still write a custom HRP in your own repo — just don't name it hrp_weights:

# In your-strategy/src/your_strategy/custom_alloc.py
from mts1b_quantkit.allocators import hrp_weights # canonical
from mts1b_quantkit.shrinkage import ledoit_wolf

def hrp_with_shrinkage(returns):
"""Custom variant: HRP on Ledoit-Wolf-shrunk cov matrix."""
shrunk = ledoit_wolf(returns.cov())
# Use the canonical impl, just pass different data
return hrp_weights(shrunk_returns)

The CI cares about the name, not the algorithm — because the name is the contract.

Why "exactly once" and not "approved fork"

Approved-fork ("you can fork it if you ask") sounds appealing but breaks down:

  • It creates a maintenance tax (every fork needs upstream review).
  • It encourages "just one quick local change" which never gets contributed back.
  • It defeats the readability win (the reader still has to ask "which HRP is this?").

Better: the canonical impl has extension points (custom distance metric, custom cluster algorithm). If you need a knob the canonical doesn't expose, open a PR adding it.

Status

  • W0: ✅ 17 patterns identified, this contract written.
  • W0 in progress: per-pattern audit + canonical-impl move + codemod for call sites + CI gate.
  • v1 launch (month 3): all 12 v1 repos pass the CI gate.