The Journal of Portfolio Management, v.52, no.2, pp.150 - 183
Abstract
Clustering and similarity learning are increasingly indispensable for structuring heterogeneous financial data and supporting real-world decision-making. Traditional heuristics such as industry codes, static style boxes, or return correlations offer only coarse and rigid notions of peer groups. Recent advances in metric learning, graph methods, and large language models now make it possible to build adaptive neighborhoods of securities, funds, companies, and investors that align more closely with actual risk, liquidity, and thematic exposures. This tutorial synthesizes these methodological developments and demonstrates their use across major asset classes. Case studies show how supervised proximities improve bond substitution, how fund similarity systems reconcile category reproducibility with outlier detection, how multimodal pipelines refine company comparables for valuation and strategy, and how investor clustering enhances personalization and “know your client” (KYC) analytics. We emphasize modeling choices that make clustering and similarity auditable and robust under regime shifts. We also outline their evaluation protocols such as neighborhood stability, substitution fidelity, and segment utility, and so on, which align with investment, compliance, and fiduciary objectives. Overall, the central message for practitioners is pragmatic: Similarity systems have moved beyond experimental prototypes and now stand as deployable techniques within real investment workflows.