DATA AND AI ARE CHANGING THE WAY ORGANIZATIONS THINK, DECIDE, AND ORGANIZE. IT’S TIME HUMANITIES, MANAGEMENT AND SOCIAL SCIENCES GET INVOLVED.
Metalab

NEWS

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

EVENTS

RESEARCH

ASSIGNING TOPICS TO DOCUMENTS BY SUCCESSIVE PROJECTIONS

[ARTICLE] This paper introduces the Successive Projection Overlapping Clustering (SPOC) algorithm for efficiently organizing text corpora into topics, providing theoretical guarantees and outperforming Latent Dirichlet Allocation, especially in managing large dictionaries.

by Olga Klopp (ESSEC Business School),  Maxim PanovSuzanne SigallaAlexandre Tsybakov

Topic models provide a useful tool to organize and understand the structure of large corpora of text documents, in particular, to discover hidden thematic structure. Clustering documents from big unstructured corpora into topics is an important task in various areas, such as image analysis, e-commerce, social networks, population genetics. A common approach to topic modeling is to associate each topic with a probability distribution on the dictionary of words and to consider each document as a mixture of topics. Since the number of topics is typically substantially smaller than the size of the corpus and of the dictionary, the methods of topic modeling can lead to a dramatic dimension reduction. In this paper, we study the problem of estimating topics distribution for each document in the given corpus, that is, we focus on the clustering aspect of the problem. We introduce an algorithm that we call Successive Projection Overlapping Clustering (SPOC) inspired by the Successive Projection Algorithm for separable matrix factorization. This algorithm is simple to implement and computationally fast. We establish theoretical guarantees on the performance of the SPOC algorithm, in particular, near matching minimax upper and lower bounds on its estimation risk. We also propose a new method that estimates the number of topics. We complement our theoretical results with a numerical study on synthetic and semi-synthetic data to analyze the performance of this new algorithm in practice. One of the conclusions is that the error of the algorithm grows at most logarithmically with the size of the dictionary, in contrast to what one observes for Latent Dirichlet Allocation.

[Please read the research paper here]

Research list
MULTIVARIATE VOLATILITY FORECASTS FOR STOCK MARKET INDICES

MULTIVARIATE VOLATILITY FORECASTS FOR STOCK MARKET INDICES

[ARTICLE] This study forecasts realized variance for major international stock market indices, incorporating jump, continuous, and option-implied variance components, using ...
DYNAMICS OF VARIANCE RISK PREMIA: A NEW MODEL FOR DISENTANGLING THE PRICE OF RISK

DYNAMICS OF VARIANCE RISK PREMIA: A NEW MODEL FOR DISENTANGLING THE PRICE OF RISK

[ARTICLE] This paper presents a dynamic model for the variance risk premium that separates the continuous component from jump impacts, ...
MINIMUM COST NETWORK DESIGN IN STRATEGIC ALLIANCES

MINIMUM COST NETWORK DESIGN IN STRATEGIC ALLIANCES

[ARTICLE] This paper investigates the impact of transaction costs on the viability of strategic alliances in service network design, highlighting ...
PROBABILISTIC FORECASTING OF BUBBLES AND FLASH CRASHES

PROBABILISTIC FORECASTING OF BUBBLES AND FLASH CRASHES

[ARTICLE] This paper proposes a near explosive random coefficient autoregressive model (NERC) to predict probabilities of bubbles and crashes in ...
Founded in 2020 by ESSEC Business School, The Metalab Institute for Artificial Intelligence, Data and Society helps organizations navigate and better understand the social, economic, cultural, and ethical impacts of AI and data

metalab@essec.edu

Learn more about the Metalab Institute

copyright © 2026 metalab Institute

arrow-right
Résumé de la politique de confidentialité

Ce site utilise des cookies afin que nous puissions vous fournir la meilleure expérience utilisateur possible. Les informations sur les cookies sont stockées dans votre navigateur et remplissent des fonctions telles que vous reconnaître lorsque vous revenez sur notre site Web et aider notre équipe à comprendre les sections du site que vous trouvez les plus intéressantes et utiles.