James Jianqiao Yu
Home Publications Services


Department of Computer Science

University of York

CSE/139, YO10 5GH, UK

jqyu(at)ieee.org CV Google Scholar
Resource-constrained Federated Edge Learning with Heterogeneous Data: Formulation and Analysis

Yi Liu’, Yuanshao Zhu’, and James J.Q. Yu*

IEEE Transactions on Network Science and Engineering, Volume 9, Issue 5, September 2022, Pages 3166--3178

Efficient collaboration between collaborative machine learning and wireless communication technology, forming a Federated Edge Learning (FEEL) has spawned a series of next-generation intelligent applications. FEEL faces heterogeneous data and non-IID (Independent and Identically Distributed) data, and the communication cost between edge devices (or clients) is much more expensive than their local computational overhead, which is not friendly to the resource-constrained FEEL. Thus, we propose a distributed approximate Newton-type algorithm with fast convergence speed to ameliorate the problem of FEEL resource (in terms of communication resources) constraints. Specifically, the proposed algorithm is improved based on L-BFGS and allows each client to approximate the high-cost Hessian matrix by computing the low-cost Fisher matrix in a distributed manner to find a “better” descent direction, thereby speeding up convergence. Second, we prove that the proposed algorithm has linear convergence in both strongly convex and non-convex cases and analyze its computational and communication complexity. Furthermore, we design a simple but elegant training scheme, namely Federated Ove-vs-All (FedOVA) to solve the heterogeneous statistical challenge brought by heterogeneous data. In this way, FedOVA first decomposes a multi-class classification problem into more straightforward binary classification problems and then combines their respective outputs using ensemble learning. Extensive case studies show that the performance of our method is better than the classic federated average algorithm and data sharing scheme on heterogeneous data and non-IID data. Notably, compared with distributed stochastic gradient descent, the proposed algorithm needs fewer communication iterations to achieve comparable performance.