Tags:Data-Centric AI, Measurement, Metrics and ML Data
Abstract:
Maintainable, high quality, rapidly built, scalable ML datasets have been fundamental for multiple AI production applications that we have worked on. How have we gone about building these ML datasets in a systematic way? Our approach has included defining a set of operational metrics for ML data. Our framework for organizing those metrics focuses on goals that we have: time to launch, effect on model performance, properties of the data, data quality, and tracking dataset and historical changes. In each area, we have defined more detailed metrics and created operational processes to track them. Through disciplined tracking, we have seen the benefits of ML dataset improvements on ML performance improvements in diverse examples.