Reinforcement learning in continuous state- and action-space : WestminsterResearch

Title	Reinforcement learning in continuous state- and action-space
Type	PhD thesis
Authors	Nichols, B.D.
Abstract	Reinforcement learning in the continuous state-space poses the problem of the inability to store the values of all state-action pairs in a lookup table, due to both storage limitations and the inability to visit all states sufficiently often to learn the correct values. This can be overcome with the use of function approximation techniques with generalisation capability, such as artificial neural networks, to store the value function. When this is applied we can select the optimal action by comparing the values of each possible action; however, when the action-space is continuous this is not possible. In this thesis we investigate methods to select the optimal action when artificial neural networks are used to approximate the value function, through the application of numerical optimization techniques. Although it has been stated in the literature that gradient-ascent methods can be applied to the action selection [47], it is also stated that solving this problem would be infeasible, and therefore, is claimed that it is necessary to utilise a second artificial neural network to approximate the policy function [21, 55]. The major contributions of this thesis include the investigation of the applicability of action selection by numerical optimization methods, including gradient-ascent along with other derivative-based and derivative-free numerical optimization methods,and the proposal of two novel algorithms which are based on the application of two alternative action selection methods: NM-SARSA [40] and NelderMead-SARSA. We empirically compare the proposed methods to state-of-the-art methods from the literature on three continuous state- and action-space control benchmark problems from the literature: minimum-time full swing-up of the Acrobot; Cart-Pole balancing problem; and a double pole variant. We also present novel results from the application of the existing direct policy search method genetic programming to the Acrobot benchmark problem [12, 14].
Year	2014
File	Barry_NICHOLS_2014.pdf
Publisher	University of Westminster
Digital Object Identifier (DOI)	https://doi.org/10.34737/967w8

Related outputs

Genetic programming for the minimum time swing up and balance control acrobot problem
Dracopoulos, D. and Nichols, B.D. 2017. Genetic programming for the minimum time swing up and balance control acrobot problem. Expert Systems. 34 (5), p. e12115 e12115. https://doi.org/10.1111/exsy.12115

Application of Newton's Method to action selection in continuous state- and action-space reinforcement learning
Nichols, B.D. and Dracopoulos, D. 2014. Application of Newton's Method to action selection in continuous state- and action-space reinforcement learning. in: ESANN 2014 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 23-25 April 2014 D facto.

Genetic programming as a solver to challenging reinforcement learning problems
Dracopoulos, D., Effraimidis, D. and Nichols, B.D. 2013. Genetic programming as a solver to challenging reinforcement learning problems. in: Clary, T.S. (ed.) Horizons in computer science research Hauppauge, NY Nova Science Publishers.

Swing up and balance control of the acrobot solved by genetic programming
Dracopoulos, D. and Nichols, B.D. 2012. Swing up and balance control of the acrobot solved by genetic programming. in: Bramer, M. and Petridis, M. (ed.) Research and Development in Intelligent Systems XXIX: Incorporating Applications and Innovations in Intelligent Systems XX Proceedings of AI-2012, The 32nd SGAI International Conference on Innovative Techniques & Applications of Artificial Intelligence London Springer. pp. 229-242

Permalink - https://westminsterresearch.westminster.ac.uk/item/967w8/reinforcement-learning-in-continuous-state-and-action-space

Reinforcement learning in continuous state- and action-space

Related outputs

Share this

Usage statistics

Export as