ESSEC METALAB

RESEARCH

ON THE ROBUSTNESS TO ADVERSARIAL CORRUPTION AND TO HEAVY-TAILED DATA OF THE STAHEL–DONOHO MEDIAN OF MEANS

[ARTICLE] . The authors propose using MOM versions of existing statistical functions to estimate the mean of a data set under certain conditions, such as adversarial contamination and heavy-tailed data, whose approach offers several advantages.

by Guillaume Lecué (ESSEC Business School), Jules Depersin

We consider median of means (MOM) versions of the Stahel–Donoho outlyingness (SDO) [ 2366] and of the Median Absolute Deviation (MAD) [ 30] functions to construct subgaussian estimators of a mean vector under adversarial contamination and heavy-tailed data. We develop a single analysis of the MOM version of the SDO which covers all cases ranging from the Gaussian case to the L2 case. It is based on isomorphic and almost isometric properties of the MOM versions of SDO and MAD. This analysis also covers cases where the mean does not even exist but a location parameter does; in those cases we still recover the same subgaussian rates and the same price for adversarial contamination even though there is not even a first moment. These properties are achieved by the classical SDO median and are therefore the first non-asymptotic statistical bounds on the Stahel–Donoho median complementing the n-consistency [ 58] and asymptotic normality [ 74] of the Stahel–Donoho estimators. We also show that the MOM version of MAD can be used to construct an estimator of the covariance matrix only under the existence of a second moment or of a scatter matrix if a second moment does not exist.

[Please read the research paper here]

Research list
arrow-right