Abstract:
Linear regression with shuffled labels is the problem of performing a linear regression fit on datasets whose labels are unknowingly shuffled with respect to their inputs. Such a problem relates to different applications such as genome sequence assembly, sampling and reconstruction of spatial fields, and communication networks. Existing methods are either applicable only to data with limited observation errors, work only for partially shuffled data, sensitive to initialization, and/or work only with small dimensions. This paper tackles this problem in its full generality using stochastic approximation, which is based on a first-order permutation-invariant constraint. We propose an optimal recursive algorithm that updates the estimate from the underdetermined function that is based on that permutation-invariant constraint. The proposed algorithm aims for per-iteration minimization of the mean square estimate error. Although our algorithm is sensitive to initialization errors, to the best of our knowledge, the resulting method is the first working solution for arbitrary large dimensions and arbitrary large observation errors while its computation throughput appears insignificant. Numerical simulations show that our method with shuffled datasets can outperform the ordinary least squares method without shuffling. We also consider a batch process to this problem where the datasets are independently available. The solution we propose is independent of initialization but requires that number of such datasets to be at least equal to the dimension of the unknown vector.
Citation:
Saab, S. S., & Saab, K. K. (2019, March). Shuffled Linear Regression with Erroneous Observations. In 2019 53rd Annual Conference on Information Sciences and Systems (CISS) (pp. 1-6). IEEE.