Data-Centric Optimization Approach for Small, Imbalanced Datasets
Journal: Journal of Information and Organizational Sciences (JIOS) (Vol.47, No. 1)Publication Date: 2023-06-30
Authors : Vladislav Tanov;
Page : 167-177
Keywords : imbalanced dataset; classification; data centric; optimization; machine learning;
Abstract
Data-centric is a newly explored concept, where the attention is given to data optimization methodologies and techniques to improve model performance, rather than focusing on machine learning models and hyperparameter tunning. This paper suggests an effective data optimization methodology for optimizing imbalanced small datasets that improves machine learning model performance.
This paper is focused on providing an effective solution when the number of observations is not enough to construct a machine learning model with high values of the estimated magnitudes. For example, the majority of the observations are labeled as one class (majority class), and the rest as the other, commonly considered as the class of interest (minority class). The proposed methodology does not depend on the applied classification models, rather it is based on the properties of the data resampling approach to systematically enhance and optimize the training dataset. The paper examines numerical experiments applying the data centric optimization methodology, and compares with previously obtained results by other authors.
Other Latest Articles
- The Fundamentals of Metaverse: A Review on Types, Components and Opportunities
- Query Refinement into Information Retrieval Systems: An Overview
- The Influence of Digital Maturity, Competitive Priorities and Decision-making Styles on the Acceptance of Digital Technologies in Micro and Small Organizations
- Exploring Work from Home: Scale Construction and Its Use in Determining Croatian Engineers’ Job Satisfaction
- A Modified Boosted Ensemble Classifier on Location Based Social Networking
Last modified: 2023-08-21 16:06:18