Convergence of the iterates of descent methods for analytic cost functions, SIAM Journal on Optimization, vol.16, issue.2, pp.531-547, 2005. ,
Efficient full-matrix adaptive regularization, Proceedings of the 36th International Conference on Machine Learning, vol.97, pp.9-15, 2019. ,
On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Mathematical Programming, vol.116, issue.1-2, pp.5-16, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00803898
Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-?ojasiewicz inequality, Mathematics of Operations Research, vol.35, issue.2, pp.438-457, 2010. ,
Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized gauss-seidel methods, Mathematical Programming, vol.137, issue.1-2, pp.91-129, 2013. ,
URL : https://hal.archives-ouvertes.fr/inria-00636457
Characterizations of ?ojasiewicz inequalities: subgradient flows, talweg, convexity. Transactions of the, vol.362, pp.3319-3363, 2010. ,
Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming, vol.146, issue.1-2, pp.459-494, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00916090
First order methods beyond convexity and lipschitz gradient continuity with applications to quadratic inverse problems, SIAM Journal on Optimization, vol.28, issue.3, pp.2131-2151, 2018. ,
Optimization methods for large-scale machine learning, Siam Review, vol.60, issue.2, pp.223-311, 2018. ,
On the convergence of a class of adam-type algorithms for non-convex optimization, International Conference on Learning Representations, 2019. ,
, Convergence guarantees for rmsprop and adam in non-convex optimization and their comparison to nesterov acceleration on autoencoders, 2018.
Generalized momentum-based methods: A hamiltonian perspective, 2019. ,
Incorporating nesterov momentum into adam, 2016. ,
Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011. ,
Splitting methods with variable metric for kurdyka?ojasiewicz functions and general convergence rates, Journal of Optimization Theory and Applications, vol.165, issue.3, pp.874-900, 2015. ,
, A unified approach to adaptive regularization in online and stochastic optimization, 2017.
Convergence rates of inertial splitting schemes for nonconvex composite optimization, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4716-4720, 2017. ,
Adam: A method for stochastic optimization, International Conference on Learning Representations, 2015. ,
On gradients of functions definable in o-minimal structures, Annales de l'institut Fourier, vol.48, pp.769-783, 1998. ,
Calculus of the exponent of kurdyka-?ojasiewicz inequality and its applications to linear convergence of first-order methods, Foundations of computational mathematics, vol.18, issue.5, pp.1199-1232, 2018. ,
Convergence analysis of proximal gradient with momentum for nonconvex optimization, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2111-2119, 2017. ,
A multi-step inertial forward-backward splitting method for nonconvex optimization, Advances in Neural Information Processing Systems, pp.4035-4043, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01658854
, On the variance of the adaptive learning rate and beyond, 2019.
Une propriété topologique des sous-ensembles analytiques réels, Leséquations aux dérivées partielles, vol.117, pp.87-89, 1963. ,
Adaptive gradient methods with dynamic bound of learning rate, International Conference on Learning Representations, 2019. ,
Quasi-hyperbolic momentum and adam for deep learning, International Conference on Learning Representations, 2019. ,
Adaptive bound optimization for online convex optimization, COLT, pp.244-256, 2010. ,
Local convergence of the heavy-ball method and ipiano for non-convex optimization, Journal of Optimization Theory and Applications, vol.177, issue.1, pp.153-180, 2018. ,
ipiano: Inertial proximal algorithm for nonconvex optimization, SIAM Journal on Imaging Sciences, vol.7, issue.2, pp.1388-1419, 2014. ,
On the difficulty of training recurrent neural networks, International conference on machine learning, pp.1310-1318, 2013. ,
Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics, vol.4, issue.5, pp.1-17, 1964. ,
On the convergence of adam and beyond, International Conference on Learning Representations, 2018. ,
A stochastic approximation method. The annals of mathematical statistics, pp.400-407, 1951. ,
A stochastic approximation method, Herbert Robbins Selected Papers, pp.102-109, 1985. ,
If there existsk ? N s.t. H(zk) = H(z) , then H(zk +1 ) = H(z) and by the first point of Lemma B.1, xk +1 = xk and then (x k ) k?N is stationary and for all k ?k , H(z k ) = H(z) and the results of the theorem hold in this case (note thatz ? critH by Lemma 5.1). Therefore, we can assume now that H(z) < H(z k )?k > 0 since (H(z k )) k?N is nonincreasing and Equation (20) holds. One more time, International conference on machine learning, pp.1139-1147, 2013. ,
, From Lemma 5.1, we get d(z k , ?(z 0 )) ? 0 as k ? +? . Hence, for all ? > 0, there exists k 1 ? N s.t. d(z k , ?(z 0 )) < ? for all k > k 1 . Moreover, ?(z 0 ) is a nonempty compact set and H is finite and constant on it. Therefore, we can apply the uniformization Lemma 5.2 with ? = ?(z 0 ). Hence, for any k > l := max(k 0 , k 1 ), H(z k ) < H(z) + ? for all k > k 0