Data Science Research: Statistics

Main research directions


June 29 – July 2, 2021

Mathematical Statistics and Learning 2021

From June 29 – July 2, 2021 we will organize a conference on “Mathematical Statistics and Learning“. The meeting will bring together leading experts in mathematical statistics and machine learning to discuss high-dimensional and structured problems resulting from the modelling and statistical analysis of data from large complex systems. The organizing committee is: Gábor Lugosi (UPF & BGSE), Gergely Neu (UPF), Caroline Uhler (MIT), and Piotr Zwiernik UPF & BGSE).

O. Papaspiliopoulos and N. Chopia. Springer, 2020

An Introduction to Sequential Monte Carlo

This book provides a general introduction to Sequential Monte Carlo (SMC) methods, also known as particle filters. These methods have become a staple for the sequential analysis of data in such diverse fields as signal processing, epidemiology, machine learning, population ecology, quantitative finance, and robotics.

An Introduction to Sequential Monte Carlo

LP. Bartlett, P.L. Long, G. Lugosi, and A. Tsigler. PNAS, 2020.

Benign overfitting in linear regression

The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology: deep neural networks seem to predict well, even with a perfect fit to noisy training data. Motivated by this phenomenon, Gábor and his co-authors consider when a perfect fit to training data in linear regression is compatible with accurate prediction. Their analysis shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. (link to the article)

S. Lauritzen, C. Uhler and P. Zwiernik

Total positivity in exponential families with application to binary variables

Annals of Statistics, to appear

  • P. Bartlett, P.L. Long, G. Lugosi, and A. Tsigler. Benign overfitting in linear regression. PNAS, to appear, 2020.

  • C. Bordenave, G. Lugosi, and N. Zhivotovskiy. Noise sensitivity of the top eigenvector of a Wigner matrix. Probability Theory and Related Fields, to appear, 2020.

  • G. Lugosi, and S. Mendelson. Robust multivariate mean estimation: the optimality of trimmed mean. Annals of Statistics, to appear, 2020.

  • S. Lauritzen, C. Uhler and P. Zwiernik, Total positivity in exponential families with application to binary variables. Annals of Statistics, to appear.

  • G. Lugosi, and S. Mendelson. Regularization, sparse recovery, and median-of-means tournaments. Bernoulli, to appear.

  • G. Lugosi, and S. Mendelson, Near-optimal mean estimators with respect to general norms. Probability Theory and Related Fields, to appear.

  • G. Lugosi, and S. Mendelson. Risk minimization by median-of-means tournaments. Journal of the European Mathematical Society.

  • L. Beauchemin, M. Slifker, D. Rossell, and J. Font-Burgada. Characterizing MHC-I genotype predictive power for oncogenic mutation probability in cancer patients. Immunoinformatics, Methods and Protocols. Springer, in press.

  • Predicting election results with emerging parties by J.G. Montalvo, O. Papaspiliopoulos and T. Stumpf-Fetizon. European Journal of Political Economy.

  • Continuous mixtures with skewness and heavy tails by D. Rossell and M.F.J. Steel. In Handbook of mixture analysis, Chapter 10, CRC press

  • A. Corral, F. Udina and E. Arcaute, Truncated lognormal distributions and scaling in the size of naturally defined population clusters. Physical Review E, 2020, 101, No. 4.

  • On choosing mixture components via non-local priors by J. Fúquene, M.F.J. Steel, and D. Rossell. Journal of the Royal Statistical Society B, 2019, 81, 5, 809-837.

  • Maximum likelihood estimation in Gaussian models under total positivity by S. Lauritzen, C. Uhler, and P. Zwiernik. Annals of Statistics, 2019, Vol. 47, No. 4, 1835-1863.

  • Sub-Gaussian estimators of the mean of a random vector by G. Lugosi, and S. Mendelson. Annals of Statistics, 2019, Vol. 47, No. 2, pp 783-794.

  • Auxiliary gradient‐based sampling algorithms by Titsias, Michalis K., and O. Papaspiliopoulos. Journal of the Royal Statistical Society: Series B, (Statistical Methodology) 80.4, 2018, pp 749-767.

  • Tractable Bayesian variable selection: beyond normality by D. Rossell and F.J. Rubio. Journal of the American Statistical Association, 2018, pp 1-17.

  • Nonlocal priors for high-dimensional estimation by D. Rossell and D. Telesca. Journal of the American Statistical Association, 2017, 112.517, pp 254-265.

  • Maximum likelihood estimation for linear Gaussian covariance models by P. Zwiernik, C. Uhler, and D. Richards. Journal of the Royal Statistical Society: Series B, 79(4), 2017, 1269–1292.

  • Total positivity in Markov structures by S. Fallat, S. Lauritzen, K. Sadeghi, C. Uhler, N. Wermuth, and P. Zwiernik. Annals of Statistics 2017, Vol. 45, No. 3, 1152-1184.

  • Set estimation from reflected Brownian motion by A. Cholaquidis, R. Fraiman, G. Lugosi, and B. Pateiro-López. Journal of the Royal Statistical Society: Series B, 2016, 78:1057–1078.

  • Sub-Gaussian mean estimators by L. Devroye, M. Lerasle, G. Lugosi, and R. Imbuzeiro Oliveira. Annals of Statistics, 2016, 44:2695-2725.

  • Almost optimal sparsification of random geometric graphs by N. Broutin, L. Devroye, and G. Lugosi, Annals of Applied Probability, 2016, 26:5, 3078-3109.

  • On probability laws of solutions of differential systems driven by fractional Brownian motion by F. Baudoin, E. Nualart, C. Ouyang, and S. Tindel, Annals of Probability, 2016, 44, pp 2554-2590.

  • Exact sampling of diffusions with a discontinuity in the drift by O. Papaspiliopoulos, G. Roberts, and K. Taylor, Advances in Applied Probability, 2016, 48(A), 249-259.

  • Exponential varieties by M. Michałek, B. Sturmfels, C. Uhler, and P. Zwiernik, Proceedings of the London Mathematical Society (3) 112 (2016), no. 1, 27–56.

  • Empirical risk minimization for heavy-tailed losses by C. Brownlees, E. Joly and G. Lugosi, Annals of Statistics, 2015, 43(6), 2507-2536.

  • Gavard R, Jones H, Palacio Lozano D, Thomas M, Rossell D, Spencer S, Barrow M (2020). KairosMS: A new solution for the processing of hyphenated ultrahigh resolution mass spectrometry data. Analytical Chemistry, 92.5 3775-86

  • Gavard R, Palacio Lozano D, Guzman A, Rossell D, Spencer S, Barrow M (2019). Rhapso: Automatic stitching of mass segments from Fourier transform ion cyclotron resonance mass spectra. Analytical Chemistry, 91:15130-37

  • M. Greenacre. Variable selection in compositional data analysis using pairwise logratios. Mathematical Geosciences, 2018, 1-34.

  • Marty R, Kaabinejadian S, van de Haar J, Rossell D, Ideker T, Hildebrand W, Engin HB, Font-Burgada J, Carter H. (2017) MHC-I genotype restricts the oncogenic mutational landscape. Cell, 171, 1272-1283

  • Font-Burgada J, Shalapour S, Ramaswamy S, Hsueh B, Rossell D, Umemura A, Taniguchi K, Nakagawa H, Valasek MA, Ye L, Kopp JL, Sander M, Carter H, Deisseroth K, Verma IM, Karin M. (2015) Hybrid Periportal Hepatocytes Regenerate the Injured Liver without Giving Rise to Cancer. Cell, 162(4):766-79.

  • Calon A, Lonardo E, Berenguer A, Espinet E, Hernando-Momblona X, Iglesias M, Sevillano M, Palomo-Ponce S, Tauriello DVF, Byrom D, Cortina C, Morral C, Barceló C, Tosi S, Riera A, Stephan-Otto Attolini C, Rossell D, Sancho E, Batlle E. (2015) Stromal gene expression defines poor prognosis subtypes in colorectal cancer. Nature Genetics, 47, 320-329. doi:10.1038/ng.3225

Christian Brownlees:

Annals of Financial Economics, Econometrics, Journal of Network Theory in Finance, Journal of Risk and Financial Management

Gábor Lugosi:

Annals of Applied Probability, Journal of Machine Learning Research, Probability Theory and Related Fields

Eulàlia Nualart:

Stochastic Processes and their Applications (Associate Editor)

Omiros Papaspiliopoulos:

Biometrika (Deputy Editor), SIAM Journal of Uncertainty Quantification

David Rossell:

Bayesian Analysis (Associate Editor)

Piotr Zwiernik:

Biometrika, Journal of Algebraic Statistics, Scandinavian Journal of Statistics

“Prediccion, Inferencia y Computacion en Modelos Estructurados de Alta Dimension”

  • Reference: PGC2018-101643-B

  • Financing entity: Ministerio de Economía y Competitividad (MINECO)

  • Dates: 2019-2021

  • Principle investigators: Gábor Lugosi, Omiros Papaspiliopoulos

  • Amount: € 141,812

“Algorithms and Learning for AI”

  • Financing entity: Google

  • Dates: 2018-2020

  • Principle investigator: Gábor Lugosi

  • Amount: USD 150,000

“High-dimensional problems in structured probabilistic models”

  • Financing entity: Fundación BBVA

  • Dates: 2018-2020

  • Principle investigator: Gabor Lugosi

  • Amount: € 100,000

“Estimación de redes latentes”

  • Reference: MTM2015-67304-P

  • Financing entity: Ministerio de Economía y Competitividad (MINECO)

  • Dates: 2016-2018

  • Principle investigators: Gabor Lugosi, Omiros Papaspiliopoulos

  • Amount: € 52,998

Contact us

We’d love to hear from you… Drop us a line to get in touch!