Skip to content

Commit efe2ffd

Browse files
Minor grammar in README (#505)
1 parent 910b414 commit efe2ffd

File tree

1 file changed

+23
-23
lines changed

1 file changed

+23
-23
lines changed

README.rst

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ https://pytorch-optimizer.rtfd.io
4646

4747
Citation
4848
--------
49-
Please cite original authors of optimization algorithms. If you like this
49+
Please cite the original authors of the optimization algorithms. If you like this
5050
package::
5151

5252
@software{Novik_torchoptimizers,
@@ -57,7 +57,7 @@ package::
5757
version = {1.0.1}
5858
}
5959

60-
Or use github feature: "cite this repository" button.
60+
Or use the github feature: "cite this repository" button.
6161

6262

6363
Supported Optimizers
@@ -155,29 +155,29 @@ Supported Optimizers
155155

156156
Visualizations
157157
--------------
158-
Visualizations help us to see how different algorithms deal with simple
158+
Visualizations help us see how different algorithms deal with simple
159159
situations like: saddle points, local minima, valleys etc, and may provide
160-
interesting insights into inner workings of algorithm. Rosenbrock_ and Rastrigin_
161-
benchmark_ functions was selected, because:
160+
interesting insights into the inner workings of an algorithm. Rosenbrock_ and Rastrigin_
161+
benchmark_ functions were selected because:
162162

163163
* Rosenbrock_ (also known as banana function), is non-convex function that has
164-
one global minima `(1.0. 1.0)`. The global minimum is inside a long,
165-
narrow, parabolic shaped flat valley. To find the valley is trivial. To
166-
converge to the global minima, however, is difficult. Optimization
167-
algorithms might pay a lot of attention to one coordinate, and have
168-
problems to follow valley which is relatively flat.
164+
one global minimum `(1.0. 1.0)`. The global minimum is inside a long,
165+
narrow, parabolic shaped flat valley. Finding the valley is trivial.
166+
Converging to the global minimum, however, is difficult. Optimization
167+
algorithms might pay a lot of attention to one coordinate, and struggle
168+
following the valley which is relatively flat.
169169

170170
.. image:: https://upload.wikimedia.org/wikipedia/commons/3/32/Rosenbrock_function.svg
171171

172-
* Rastrigin_ function is a non-convex and has one global minima in `(0.0, 0.0)`.
172+
* Rastrigin_ is a non-convex function and has one global minimum in `(0.0, 0.0)`.
173173
Finding the minimum of this function is a fairly difficult problem due to
174174
its large search space and its large number of local minima.
175175

176176
.. image:: https://upload.wikimedia.org/wikipedia/commons/8/8b/Rastrigin_function.png
177177

178-
Each optimizer performs `501` optimization steps. Learning rate is best one found
179-
by hyper parameter search algorithm, rest of tuning parameters are default. It
180-
is very easy to extend script and tune other optimizer parameters.
178+
Each optimizer performs `501` optimization steps. Learning rate is the best one found
179+
by a hyper parameter search algorithm, the rest of the tuning parameters are default. It
180+
is very easy to extend the script and tune other optimizer parameters.
181181

182182

183183
.. code::
@@ -187,14 +187,14 @@ is very easy to extend script and tune other optimizer parameters.
187187
188188
Warning
189189
-------
190-
Do not pick optimizer based on visualizations, optimization approaches
190+
Do not pick an optimizer based on visualizations, optimization approaches
191191
have unique properties and may be tailored for different purposes or may
192-
require explicit learning rate schedule etc. Best way to find out, is to try one
193-
on your particular problem and see if it improves scores.
192+
require explicit learning rate schedule etc. The best way to find out is to try
193+
one on your particular problem and see if it improves scores.
194194

195-
If you do not know which optimizer to use start with built in SGD/Adam, once
196-
training logic is ready and baseline scores are established, swap optimizer and
197-
see if there is any improvement.
195+
If you do not know which optimizer to use, start with the built in SGD/Adam. Once
196+
the training logic is ready and baseline scores are established, swap the optimizer
197+
and see if there is any improvement.
198198

199199

200200
A2GradExp
@@ -366,7 +366,7 @@ AdaBound
366366

367367
AdaMod
368368
------
369-
AdaMod method restricts the adaptive learning rates with adaptive and momental
369+
The AdaMod method restricts the adaptive learning rates with adaptive and momental
370370
upper bounds. The dynamic learning rate bounds are based on the exponential
371371
moving averages of the adaptive learning rates themselves, which smooth out
372372
unexpected large learning rates and stabilize the training of deep neural networks.
@@ -455,9 +455,9 @@ Adahessian
455455

456456
AdamP
457457
------
458-
AdamP propose a simple and effective solution: at each iteration of Adam optimizer
458+
AdamP propose a simple and effective solution: at each iteration of the Adam optimizer
459459
applied on scale-invariant weights (e.g., Conv weights preceding a BN layer), AdamP
460-
remove the radial component (i.e., parallel to the weight vector) from the update vector.
460+
removes the radial component (i.e., parallel to the weight vector) from the update vector.
461461
Intuitively, this operation prevents the unnecessary update along the radial direction
462462
that only increases the weight norm without contributing to the loss minimization.
463463

0 commit comments

Comments
 (0)