Optimizing Training Data for Image Classifiers
In this paper, we propose a robust method for outlier removal to improve the performance for image classification.
Authors: Matthew Hagen, Ala Eddine Ayadi, Jiaqi Wang, Nikolaos Vasiloglou, Estelle Afshar. 2019.
In KDD 2019 Workshop on Data Collection, Curation, and Labeling for Mining and Learning (DCCL, KDD ‘19).
In this paper, we propose a robust method for outlier removal to improve the performance for image classification. Increasing the size of training data does not necessarily raise prediction accuracy, due to instances that may be poor representatives of their respective classes. Four separate experiments are tested to evaluate the effectiveness of outlier removal for several classifiers. Embeddings are generated from a pre-trained neural network, a fine-tuned network, as well as a Siamese network. Subsequently, outlier detection is evaluated based on clustering quality and classifier performance from a fully-connected feed-forward network, K-Nearest Neighbors and gradient boosting model.
Read the PDF: Optimizing Training Data for Image Classifiers (opens in a new tab)
Related Posts
Product Collection Recommendation in Online Retail
Recommender systems are an integral part of eCommerce services, helping to optimize revenue and user satisfaction. Bundle recommendation has recently gained attention by the research community since behavioral data supports that users often buy more than one product in a single transaction. In most cases, bundle recommendations are of the form “users who bought product A also bought products B, C, and D”. Although such recommendations can be useful, there is no guarantee that products A,B,C, and D may actually be related to each other. In this paper, we address the problem of collection recommendation, i.e., recommending a collection of products that share a common theme and can potentially be purchased together in a single transaction.
Algebraic Modeling in Datalog
Datalog is a deductive language tailored for easy database access. We introduce an algebraic modeling language in Datalog for mixed-integer linear optimization models.