Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization

Abstract : Although ADAM is a very popular algorithm for optimizing the weights of neural networks, it has been recently shown that it can diverge even in simple convex optimization examples. Several variants of ADAM have been proposed to circumvent this convergence issue. In this work, we study the ADAM algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate. The bound on the adaptive step size depends on the Lipschitz constant of the gradient of the objective function and provides safe theoretical adaptive step sizes. Under this boundedness assumption, we show a novel first order convergence rate result in both deterministic and stochastic contexts. Furthermore, we establish convergence rates of the function value sequence using the Kurdyka-Łojasiewicz property.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [36 references]  Display  Hide  Download

https://hal.telecom-paristech.fr/hal-02366337
Contributor : Anas Barakat <>
Submitted on : Friday, November 15, 2019 - 6:47:34 PM
Last modification on : Wednesday, November 20, 2019 - 1:04:36 AM

Files

rate_convergence_adam.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02366337, version 1
  • ARXIV : 1911.07596

Collections

Citation

Anas Barakat, Pascal Bianchi. Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization. 2019. ⟨hal-02366337⟩

Share

Metrics

Record views

30

Files downloads

17