![]() These can have disadvantages in comparison to cost-sensitive methods since the creation of new data points via oversampling of existing data points may lead to overfitting as well as additional noise, while undersampling removes information (Cui et al. Only few works explore methods improving model performance for rare cases in regression settings, mostly proposing sampling-based approaches (Branco et al. It is harder to define which values are rare for regression tasks in comparison to classification tasks, since one cannot simply use class frequencies (Branco et al. ![]() Typical solutions to data imbalance require a notion of rarity or importance for a data point in order to know which data points to over- and undersample or which data points to weight more strongly. However, these cannot be applied easily to regression tasks because of the inherent differences between continuous and discrete, nominal target values. 2008) and cost-sensitive learning approaches (Cui et al. There are many solutions to this problem for classification tasks including resampling strategies (Chawla et al. Our approach provides more control over model training as it enables us to actively decide on the trade-off between focusing on common or rare cases through a single hyperparameter, allowing the training of better models for rare data points. Additionally, we compare DenseLoss to the state-of-the-art method SMOGN, finding that our method mostly yields better performance. We show on multiple differently distributed datasets that DenseLoss significantly improves model performance for rare data points through its density-based weighting scheme. DenseLoss adjusts each data point’s influence on the loss according to DenseWeight, giving rare data points more influence on model DenseWeight weights data points according to their target value rarities through kernel density estimation (KDE). In this work, we propose a sample weighting approach for imbalanced regression datasets called DenseWeight and a cost-sensitive learning approach for neural network regression with imbalanced data called DenseLoss based on our weighting scheme. Of the few solutions for regression tasks, barely any have explored cost-sensitive learning which is known to have advantages compared to sampling-based methods in classification tasks. While there are numerous well studied solutions for classification settings, most of them cannot be applied to regression easily. For example, when estimating precipitation, extreme rainfall events are scarce but important considering their potential consequences. This is especially problematic for tasks focusing on these rare occurrences. Imbalanced data impedes model performance of learning algorithms, like neural networks, mostly for rare cases.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |