Abstract:
Federated Learning (FL) has emerged as a promising framework for collaborative model training across distributed devices without centralizing sensitive data. However, FL encounters significant challenges when dealing with non-independent and non-identically distributed (Non-IID) data across participating clients, such as skewed label distributions and varying data quantities. Existing solutions still have several constraints leading to suboptimal model performance and slow convergence. In this paper, we propose a novel approach that incorporates genetic algorithms with an enhanced client selection strategy, utilizing client metadata rather than raw data. Our approach not only mitigates the impact of non-IID data by selecting clients with diverse and representative data distributions, but also enables continuous assessment after each training round without compromising model performance. We demonstrate the effectiveness of our approach through extensive experimentation using the MNIST, CIFAR-10, and FeKDD datasets. Our results show a significant reduction in communication overhead and enhancement in overall FL performance compared to random client selection methods. This research provides practical insights and solutions for using FL in real-world scenarios with diverse data distributions.