I would encourage you to check out the "No Free Lunch" paper[1]. tl;dr the corre...

I would encourage you to check out the "No Free Lunch" paper[1]. tl;dr the correct approach depends on the problem, there is no approach that is superior to all others. There are a provably infinite number of optimization problems where RANDOM GUESSING is superior to any other algorithm.

(Note: I am not claiming support for the methodology in the paper, I just find the application of search/optimization algos is lovely)

That being said back before I understood how to program gradient descent I would often use Evolutionary Algorithms such as simulated annealing to optimize NNs! (never claimed it was state of the art, I just wanted to build and optimize some small networks by hand and my calculus was pretty weak at the time)

[1]: https://en.m.wikipedia.org/wiki/No_free_lunch_in_search_and_...