The dedupe contract

Across the source monorepo we found 17 implementations of common patterns scattered across 3-6 repos each. Each is consolidated into exactly one canonical implementation before extraction. CI enforces uniqueness.

This page documents the full table, the consolidation procedure, and the CI gate.

The 17 patterns

#	Pattern	Old copies	Old locations	Canonical home	LoC saved
1	HRP (López de Prado 2016)	4	`gpuBT1060/portfolio/hrp.py`, `research/modules/allocators/hrp.py`, `research/modules/allocators/advanced.py`, `research/portfolio_attribution/allocator_extensions.py`	`mts1b-quantkit`	~200
2	Black-Litterman	3+	`platform/allocator/`, `research/modules/allocators/`, `research/portfolio_attribution/`	`mts1b-quantkit`	~150
3	Walk-forward / purged CV	3	`gpuBT1060/eval/statistical.py::walk_forward()`, `webui/scripts/backtest_replay.py::cmd_walk_forward()`, `research/modules/backtests/engines/ladder.py`	`mts1b-quantkit`	~150
4	Calmar / MaxDD / Sharpe variants	4	`gpuBT1060/kernels/metrics.py`, `gpuBT1060/kernels/metrics_extended.py`, `gpuBT1060/lifecycle/shadow.py::_max_dd()`, `gpuBT1060/analytics/bootstrap.py::max_drawdown()`	`mts1b-quantkit`	~100
5	Kelly / VolTarget sizing	5+	`gpuBT1060/portfolio/long_short.py`, `treasury/factory/sweep.py`, `treasury/live/coinbase_allocator.py`, `research/portfolio_attribution/multi_sleeve_allocator.py`, `research/strategies.py`	`mts1b-portfolio`	~250
6	Telegram / Slack dispatch	6	`operations/main.py`, `research/api/marketing_views.py`, `research/ops/altdata_health.py`, `trading/api/orders.py`, `trading/modules/risk/untradeable.py`, `treasury/paper/alerts.py`	`mts1b-platform/messaging`	~200
7	Fee / slippage / commission models	4+	`gpuBT1060/core/cost_model.py`, `gpuBT1060/execution/cost_calibration.py`, `gpuBT1060/core/venue_costs.py` (471 LoC!), `trading/workers/broker_fill_normalizer.py`	`mts1b-quantkit/cost_models`	~150
8	Calendar / holiday / session	5+	`treasury/data_sources/global_calendar.py`, `treasury/data_sources/calendar_coverage.py`, `gpuBT1060/factors/calendar_effects.py`, `research/api/sessions_views.py`, `research/portfolio/session_risk_multipliers.py`	`mts1b-platform/calendars`	~200
9	Symbol normalization	3+	`research/modules/molly/api.py::_normalize_quote()`, `tv-bridge/broker.py`, `trading/workers/broker_exit_reconciler.py`	`mts1b-platform/symbology`	~80
10	ADV / liquidity filters	4+	`research/bin/find_alpha.py`, `gpuBT1060/core/venue_costs.py`, `gpuBT1060/portfolio/walkforward.py`, `gpuBT1060/execution/order_intents.py`	`mts1b-quantkit/universe_filters`	~120
11	Logging setup	many	scattered `logging.getLogger(__name__)` + handler wiring	`mts1b-platform/logging`	~200
12	Config loading (Pydantic Settings)	many	scattered `BaseSettings` + `.env` loaders + Vault helpers	`mts1b-platform/config`	~150
13	DB connection pools	many	scattered `psycopg_pool.AsyncConnectionPool` + DuckDB wrappers	`mts1b-platform/db`	~200
14	Retry / backoff (tenacity)	many	scattered `@retry` decorators + manual exponential backoff	`mts1b-platform/retry`	~80
15	Rate limiters	many	scattered `aiolimiter` + custom token-bucket impls	`mts1b-platform/ratelimit`	~120
16	HTTP client factory	many	scattered `httpx.Client` / `aiohttp.ClientSession` factories	`mts1b-platform/http`	~150
17	Secrets redaction	scattered	scattered log filters	`mts1b-platform/security/redact`	~80

Total LoC saved: ~2,580. And every drift between copies is a latent bug we get to retire.

Why this matters

Drift between copies isn't theoretical:

The HRP impl in gpuBT1060/portfolio/hrp.py returns weights via HRPResult(weights, linkage, leaves); the one in research/modules/allocators/hrp.py returns a bare dict. A consumer that swapped between them silently got different return shapes.
One Sharpe impl annualized assuming 252 trading days; another assumed 365. Strategies looked +20% better on one path.
Telegram dispatch in 6 places meant 6 places to update when we changed escalation rules — and one was always missed.

Consolidation forces a single source of truth and eliminates an entire class of bugs.

The consolidation procedure

For each of the 17 patterns, in W0 (pre-extraction):

Identify the canonical implementation — richest tests, cleanest API, most type-safe. Often the GPU/CUDA variant for compute-heavy patterns (it has stricter shape contracts).
Move to target repo — adapt to mts1b-foundation types if needed.
Update all call sites via codemod — never manual; the codemod is auditable in the PR.
Delete the old copies — verify via git grep that no references remain.
CI enforces uniqueness — AST-based scan (see below) fails if a forbidden symbol is defined in any repo other than its canonical home.

The CI gate

tests/contracts/test_no_duplicate_implementations.py
"""Enforce the dedupe contract. Runs in every repo's CI."""

import ast
import pathlib
from typing import Iterator

# Protected symbols: name → canonical home repo
PROTECTED: dict[str, str] = {
    "hrp_weights":                 "mts1b-quantkit",
    "black_litterman":             "mts1b-quantkit",
    "walk_forward":                "mts1b-quantkit",
    "sharpe_from_moments":         "mts1b-quantkit",
    "calmar":                      "mts1b-quantkit",
    "max_drawdown":                "mts1b-quantkit",
    "kelly_fraction":              "mts1b-portfolio",
    "vol_target_weights":          "mts1b-portfolio",
    "send_telegram":               "mts1b-platform",
    "send_slack":                  "mts1b-platform",
    "compute_fee":                 "mts1b-quantkit",
    "compute_slippage":            "mts1b-quantkit",
    "is_trading_day":              "mts1b-platform",
    "next_session_close":          "mts1b-platform",
    "normalize_symbol":            "mts1b-platform",
    "passes_adv_filter":           "mts1b-quantkit",
    "get_logger":                  "mts1b-platform",
    "load_config":                 "mts1b-platform",
    "get_db_pool":                 "mts1b-platform",
    "with_retry":                  "mts1b-platform",
    "RateLimiter":                 "mts1b-platform",
    "http_client":                 "mts1b-platform",
    "redact":                      "mts1b-platform",
}


def find_def(symbol: str, path: pathlib.Path) -> Iterator[pathlib.Path]:
    """Yield every .py file under path that defines symbol at module level."""
    for p in path.rglob("*.py"):
        try:
            tree = ast.parse(p.read_text())
        except (SyntaxError, UnicodeDecodeError):
            continue
        for node in ast.walk(tree):
            if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
                if node.name == symbol:
                    yield p
                    break


def test_no_duplicate_implementations():
    repo_root = pathlib.Path(__file__).resolve().parents[2]
    this_repo = repo_root.name  # e.g. "mts1b-quantkit"

    for symbol, canonical_home in PROTECTED.items():
        hits = list(find_def(symbol, repo_root / "src"))
        if this_repo == canonical_home:
            assert len(hits) == 1, (
                f"{symbol} should be defined exactly once in its canonical home "
                f"{canonical_home}, found {len(hits)} in {this_repo}: {hits}"
            )
        else:
            assert len(hits) == 0, (
                f"{symbol} is defined in {this_repo}, but its canonical home is "
                f"{canonical_home}. Move it back, OR rename if it's a different concept. "
                f"Hits: {hits}"
            )

Why AST and not grep:

grep "def hrp_weights" matches comments + docstrings + similar names — many false positives.
AST walks definitions only. Zero false positives.
AST also handles class HRP(...) properly, since the file may have a class wrapping the function.

Pluggability without violating

You CAN still write a custom HRP in your own repo — just don't name it hrp_weights:

# In your-strategy/src/your_strategy/custom_alloc.py
from mts1b_quantkit.allocators import hrp_weights        # canonical
from mts1b_quantkit.shrinkage import ledoit_wolf

def hrp_with_shrinkage(returns):
    """Custom variant: HRP on Ledoit-Wolf-shrunk cov matrix."""
    shrunk = ledoit_wolf(returns.cov())
    # Use the canonical impl, just pass different data
    return hrp_weights(shrunk_returns)

The CI cares about the name, not the algorithm — because the name is the contract.

Why "exactly once" and not "approved fork"

Approved-fork ("you can fork it if you ask") sounds appealing but breaks down:

It creates a maintenance tax (every fork needs upstream review).
It encourages "just one quick local change" which never gets contributed back.
It defeats the readability win (the reader still has to ask "which HRP is this?").

Better: the canonical impl has extension points (custom distance metric, custom cluster algorithm). If you need a knob the canonical doesn't expose, open a PR adding it.

Status

W0: ✅ 17 patterns identified, this contract written.
W0 in progress: per-pattern audit + canonical-impl move + codemod for call sites + CI gate.
v1 launch (month 3): all 12 v1 repos pass the CI gate.

The 17 patterns​

Why this matters​

The consolidation procedure​

The CI gate​

Pluggability without violating​

Why "exactly once" and not "approved fork"​

Status​