Z. Allen-zhu and Y. Yuan, Improved SVRG for non-stronglyconvex or sum-of-non-convex objectives, International Conference on Machine Learning, pp.1080-1089, 2016.

M. Assran, N. Loizou, N. Ballas, and M. Rabbat, Stochastic gradient push for distributed deep learning, 2018.

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Review, vol.60, issue.2, pp.223-311, 2018.

V. Cevher and B. C. Vu, On the linear convergence of the stochastic gradient method with constant step-size, pp.1-9, 2017.

A. Chambolle, M. J. Ehrhardt, P. Richtárik, and C. Schöenlieb, Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications, SIAM Journal on Optimization, vol.28, issue.4, pp.2783-2808, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01569426

C. Chang and C. Lin, Libsvm: a library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol.2, issue.3, p.27, 2011.

J. Chee and P. Toulis, Convergence diagnostics for stochastic gradient descent with constant learning rate, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, vol.84, pp.9-11, 2018.

D. Csiba and P. Richtárik, Primal method for ERM with flexible mini-batching schemes and non-convex losses, 2015.

D. Csiba and P. Richtárik, Importance sampling for minibatches, Journal of Machine Learning Research, vol.19, issue.27, pp.1-21, 2018.

D. Garber and E. Hazan, Fast and simple PCA via convex optimization, 2015.

N. Gazagnadou, R. M. Gower, and J. Salmon, Optimal minibatch and step sizes for saga, 36th International Conference on Machine Learning, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02005431

R. M. Gower, P. Richtárik, and F. Bach, Stochastic quasigradient methods: Variance reduction via Jacobian sketching, 2018.

P. Goyal, P. Dollár, R. B. Girshick, P. Noordhuis, L. Wesolowski et al., , 2017.

F. Hanzely and P. Richtárik, Accelerated coordinate descent with arbitrary sampling and best rates for minibatches, 2018.

M. Hardt, B. Recht, and Y. Singer, Train faster, generalize better: stability of stochastic gradient descent, 33rd International Conference on Machine Learning, 2016.

E. Hazan and S. Kale, Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization, The Journal of Machine Learning Research, vol.15, issue.1, pp.2489-2512, 2014.

S. Horváth and P. Richtárik, Nonconvex variance reduced optimization with arbitrary sampling, 2018.

H. Karimi, J. Nutini, and M. Schmidt, Linear convergence of gradient and proximal-gradient methods under the polyak?ojasiewicz condition, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.795-811, 2016.

X. Lian, C. Zhang, H. Zhang, C. Hsieh, W. Zhang et al., Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent, Advances in Neural Information Processing Systems, pp.5330-5340, 2017.

N. Loizou and P. Richtárik, Momentum and stochastic momentum for stochastic gradient, proximal point and subspace descent methods, 2017.

N. Loizou and P. Richtárik, Convergence analysis of inexact randomized iterative methods, 2019.

S. Ma, R. Bassily, and M. Belkin, The power of interpolation: Understanding the effectiveness of SGD in modern overparametrized learning, Workshop and Conference Proceedings, vol.80, pp.3331-3340, 2018.

E. Moulines and F. R. Bach, Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Advances in Neural Information Processing Systems, pp.451-459, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00608041

I. Necoara, Y. Nesterov, and F. Glineur, Linear convergence of first order methods for non-strongly convex optimization, Mathematical Programming, pp.1-39, 2018.

D. Needell and R. Ward, Batched stochastic gradient descent with weighted sampling, Springer Proceedings in Mathematics & Statistics, pp.279-306, 2017.

D. Needell, N. Srebro, and R. Ward, Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm, Mathematical Programming, Series A, vol.155, issue.1, pp.549-573, 2016.

A. Nemirovski and D. B. Yudin, On Cezari's convergence of the steepest descent method for approximating saddle point of convex-concave functions, Soviet Mathetmatics Doklady, vol.19, 1978.

A. Nemirovski and D. B. Yudin, Problem complexity and method efficiency in optimization, 1983.

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00976649

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, vol.87, 2013.

L. Nguyen, P. H. Nguyen, M. Van-dijk, P. Richtárik, K. Scheinberg et al., SGD and hogwild! Convergence without the bounded gradients assumption, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.3750-3758, 2018.

Z. Qu and P. Richtárik, Coordinate descent with arbitrary sampling I: Algorithms and complexity, Optimization Methods and Software, vol.31, issue.5, pp.829-857, 2016.

Z. Qu and P. Richtárik, Coordinate descent with arbitrary sampling II: Expected separable overapproximation, Optimization Methods and Software, vol.31, issue.5, pp.858-884, 2016.

Z. Qu, P. Richtárik, and T. Zhang, Quartz: Randomized dual coordinate ascent with arbitrary sampling, Advances in Neural Information Processing Systems, pp.865-873, 2015.

A. Rakhlin, O. Shamir, and K. Sridharan, Making gradient descent optimal for strongly convex stochastic optimization, 29th International Conference on Machine Learning, vol.12, pp.1571-1578, 2012.

B. Recht, C. Re, S. Wright, and F. Niu, Hogwild: A lockfree approach to parallelizing stochastic gradient descent, Advances in Neural Information Processing Systems, pp.693-701, 2011.

P. Richtárik and M. Taká?, On optimal probabilities in stochastic coordinate descent methods, Optimization Letters, vol.10, issue.6, pp.1233-1243, 2016.

P. Richtárik and M. Taká?, Parallel coordinate descent methods for big data optimization, Mathematical Programming, vol.156, issue.1-2, pp.433-484, 2016.

P. Richtárik and M. Taká?, Stochastic reformulations of linear systems: algorithms and convergence theory, 2017.

H. Robbins and S. Monro, A stochastic approximation method. The Annals of Mathematical Statistics, pp.400-407, 1951.

M. Schmidt and N. Roux, Fast convergence of stochastic gradient descent under a strong growth condition, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00855113

S. Shalev-shwartz, SDCA without duality, regularization, and individual convexity, International Conference on Machine Learning, pp.747-754, 2016.

S. Shalev-shwartz, Y. Singer, and N. Srebro, Pegasos: primal estimated subgradient solver for SVM, 24th International Conference on Machine Learning, pp.807-814, 2007.

O. Shamir and T. Zhang, Stochastic gradient descent for nonsmooth optimization: Convergence results and optimal averaging schemes, Proceedings of the 30th International Conference on Machine Learning, pp.71-79, 2013.

S. Vaswani, F. Bach, and M. Schmidt, Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron, 2018.