余剑峤
James Jianqiao Yu
首页 论文 服务 ENG

教授、博士生导师

计算机科学与技术学院

哈尔滨工业大学(深圳)

广东省深圳市南山区深圳大学城

jqyu(at)hit.edu.cn jqyu(at)ieee.org Google Scholar
Resource-constrained Federated Edge Learning with Heterogeneous Data: Formulation and Analysis

作者
Yi Liu’, Yuanshao Zhu’, and James J.Q. Yu*

发表
IEEE Transactions on Network Science and Engineering, Volume 9, Issue 5, September 2022, Pages 3166--3178

摘要
Efficient collaboration between collaborative machine learning and wireless communication technology, forming a Federated Edge Learning (FEEL) has spawned a series of next-generation intelligent applications. FEEL faces heterogeneous data and non-IID (Independent and Identically Distributed) data, and the communication cost between edge devices (or clients) is much more expensive than their local computational overhead, which is not friendly to the resource-constrained FEEL. Thus, we propose a distributed approximate Newton-type algorithm with fast convergence speed to ameliorate the problem of FEEL resource (in terms of communication resources) constraints. Specifically, the proposed algorithm is improved based on L-BFGS and allows each client to approximate the high-cost Hessian matrix by computing the low-cost Fisher matrix in a distributed manner to find a “better” descent direction, thereby speeding up convergence. Second, we prove that the proposed algorithm has linear convergence in both strongly convex and non-convex cases and analyze its computational and communication complexity. Furthermore, we design a simple but elegant training scheme, namely Federated Ove-vs-All (FedOVA) to solve the heterogeneous statistical challenge brought by heterogeneous data. In this way, FedOVA first decomposes a multi-class classification problem into more straightforward binary classification problems and then combines their respective outputs using ensemble learning. Extensive case studies show that the performance of our method is better than the classic federated average algorithm and data sharing scheme on heterogeneous data and non-IID data. Notably, compared with distributed stochastic gradient descent, the proposed algorithm needs fewer communication iterations to achieve comparable performance.