Title | : | Testing Convex Truncation |
Speaker | : | Anindya De (University of Pennsylvania) |
Details | : | Tue, 17 Dec, 2024 3:00 PM @ SSB 334 |
Abstract: | : | We study the basic statistical problem of testing whether normally distributed n-dimensional data has been truncated, i.e. altered by only retaining points that lie in some unknown truncation set S. As our main algorithmic results, (1) We give a computationally efficient O(n)-sample algorithm that can distinguish the standard normal distribution from the normal conditioned on an unknown and arbitrary convex set S. (2) We give a different computationally efficient O(n)-sample algorithm that can distinguish the standard normal distribution from the normal conditioned on an unknown and arbitrary mixture of symmetric convex sets. These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially n^{O(sqrt{n})} samples. An easy argument shows that no finite number of samples suffices to distinguish the normal from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible. We also prove lower bounds on the sample complexity of distinguishing algorithms (computationally efficient or otherwise) for various classes of convex truncations; in some cases these lower bounds match our algorithms up to logarithmic or even constant factors. Based on joint work with Shivam Nadimpalli and Rocco Servedio.
Bio: Anindya De is an Associate Professor in the Department of Computer and Information Science at the University of Pennsylvania. He finished his BTech from IIT Kanpur in 2008, PhD from Berkeley in 2013 and was a postdoc at IAS and DIMACS from 2013-15. Before starting at Penn, he was on the faculty at Northwestern University. His research interests are in Boolean function analysis, probability theory, high dimensional geometry and its various applications in theoretical computer science. |