| ||||
| ||||
![]() Title:Estimating Error in Natural Distribution Estimation Conference:2025 Allerton Tags:Concentration, Distribution Estimation, Estimation, Good-Turing estimator and Missing mass Abstract: Given i.i.d. samples from an unknown discrete distribution, the goal of distribution estimation is to construct an accurate estimate of the underlying distribution. Natural distribution estimators assign one probability estimate to all letters occurring with the same frequency, and this is well-justified for i.i.d. models. However, natural estimators can be significantly erroneous for low frequency or missing (frequency 0) letters in large alphabet scenarios. In this work, we introduce a statistic that captures the unavoidable error at a particular frequency of any natural distribution estimator. For this proposed error statistic, which depends on the distribution and the samples, we provide an estimator that is non-linear in the prevalences (frequencies of frequencies). We show that the proposed estimator has low bias and is consistent, and can be used to ascertain if the distribution restricted to letters of same frequency is close to uniform. Our approach is validated through simulations on synthetic and natural language data. Estimating Error in Natural Distribution Estimation ![]() Estimating Error in Natural Distribution Estimation | ||||
| Copyright © 2002 – 2026 EasyChair |
