
RechtRé Noncommutative ArithmeticGeometric Mean Conjecture is False
Stochastic optimization algorithms have become indispensable in modern m...
read it

Random Shuffling Beats SGD Only After Many Epochs on IllConditioned Problems
Recently, there has been much interest in studying the convergence rates...
read it

On Tight Convergence Rates of Withoutreplacement SGD
For solving finitesum optimization problems, SGD without replacement sa...
read it

Random Shuffling Beats SGD after Finite Epochs
A longstanding problem in the theory of stochastic gradient descent (SG...
read it

Understanding the Impact of Model Incoherence on Convergence of Incremental SGD with Random Reshuffle
Although SGD with random reshuffle has been widelyused in machine learn...
read it

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
In this paper, we conjecture that if the permutation invariance of neura...
read it

Exponential inequality for chaos based on sampling without replacement
We are interested in the behavior of particular functionals, in a framew...
read it
Can SingleShuffle SGD be Better than Reshuffling SGD and GD?
We propose matrix norm inequalities that extend the RechtRé (2012) conjecture on a noncommutative AMGM inequality by supplementing it with another inequality that accounts for singleshuffle, which is a widely used withoutreplacement sampling scheme that shuffles only once in the beginning and is overlooked in the RechtRé conjecture. Instead of general positive semidefinite matrices, we restrict our attention to positive definite matrices with small enough condition numbers, which are more relevant to matrices that arise in the analysis of SGD. For such matrices, we conjecture that the means of matrix products corresponding to with and withoutreplacement variants of SGD satisfy a series of spectral norm inequalities that can be summarized as: "singleshuffle SGD converges faster than randomreshuffle SGD, which is in turn faster than withreplacement SGD." We present theorems that support our conjecture by proving several special cases.
READ FULL TEXT
Comments
There are no comments yet.