|Title||Reinforcement learning in continuous state- and action-space|
Reinforcement learning in the continuous state-space poses the problem of the inability to store the values of all state-action pairs in a lookup table, due to both storage limitations and the inability to visit all states sufficiently often to learn the correct values.
In this thesis we investigate methods to select the optimal action when artificial neural networks are used to approximate the value function, through the application of numerical optimization techniques. Although it has been stated in the literature that
The major contributions of this thesis include the investigation of the applicability of action selection by numerical optimization methods, including gradient-ascent along with other derivative-based and derivative-free numerical optimization methods,and the proposal of two novel algorithms which are based on the application of two alternative action selection methods: NM-SARSA  and NelderMead-SARSA.
We empirically compare the proposed methods to state-of-the-art methods from the literature on three continuous state- and action-space control benchmark problems