Title
Seminars by Seungchul Baek (Univ Maryland, USA)
Seungchul Baek, is an Associate Professor of Statistics at the University of Maryland, Baltimore County (UMBC). He is spending a week at DM-FCUP under a Staff mobility program. Professor Baek will deliver two seminars of intermediate level and one of advanced level:
- 8th JULY, 10h-11:45h --- 2 Seminars of INTERMEDIATE LEVEL
10h - 10:45h Random Partitioning and Variable Selection in High-Dimensional Classification
Abstract: I introduce two projects related to high-dimensional classification. The first project focuses on developing a classifier using random partitioning. Specifically, we split the original high-dimensional data ($p>n$) into multiple low-dimensional subsets, making sure the number of selected covariates is less than the sample size. Using these partitioned datasets, we apply linear discriminant analysis (LDA) to each subset and propose a method to aggregate the results. We provide theoretical justification for our approach by comparing its misclassification rates to those of LDA in high dimensions. The second project concerns variable selection in high-dimensional classification. By utilizing the recently proposed mirror statistic, we first identify significant variables and then develop a new classifier based on a modified version of the $\epsilon$-greedy algorithm.
10:45h - 11h Coffee-break
11h - 11:45h A Computationally Efficient Approach to Estimating Species Richness and Rarefaction Curve
Abstract: In ecological and educational studies, estimators of the total number of species and the rarefaction curve based on empirical samples are important tools. We propose a new method to estimate both the rarefaction curve and the number of species based on a ready-made numerical approach, such as quadratic optimization. The key idea in developing the proposed algorithm is nonparametric empirical Bayes estimation, incorporating an interpolated rarefaction curve via quadratic optimization with linear constraints based on g-modeling in Efron (2014). Our proposed algorithm is easily implemented and shows better performance than existing methods in terms of computational speed and accuracy. Furthermore, we provide a model selection criterion for choosing tuning parameters in the estimation procedure and propose a confidence interval based on asymptotic theory rather than resampling. We present some asymptotic results of our estimator to validate the efficiency of our estimator theoretically. A broad range of numerical studies, including simulations and real data examples, are also conducted, and the gain that it produces has been compared to existing methods.
-9th JULY, 14h-15h --- 1 Seminar of ADVANCED LEVEL
Title: Testing and Quantifying Site-Level Variability in Diagnostic Sensitivity of an Anchor Variable
Abstract: When the same disease is diagnosed across multiple clinical sites, do all sites perform equally well? Even with standardized criteria and identical cognitive tests, clinicians at different sites may interpret results differently — leading to inconsistent diagnoses that can affect patient outcomes and the validity of multi-site studies. This talk addresses this question in a setting with a binary anchor variable: a diagnostic indicator that definitively confirms disease when positive but leaves disease status uncertain when negative. We model site-specific diagnostic sensitivity with a random-effects structure and develop likelihood-based methods to estimate and test for meaningful variation across sites. Key methodological contributions include: establishing parameter identifiability using a validation subset; handling the intractable likelihood integral via a Laplace approximation and an EM algorithm; and constructing likelihood ratio and score tests that properly account for the boundary constraint on the variance component. We evaluate the approach through simulation and apply it to the Huntington disease cohort, where both tests provide strong evidence that diagnostic sensitivity differs meaningfully across sites. Site-specific effects are further explored using empirical Bayes estimation, revealing sites where diagnostic practices may warrant closer examination.