Computation Approaches for Continuous Reinforcement Learning Problems

Effraimidis, D. 2016. Computation Approaches for Continuous Reinforcement Learning Problems. PhD thesis University of Westminster Computer Science https://doi.org/10.34737/q0y82

Title Computation Approaches for Continuous Reinforcement Learning Problems PhD thesis Effraimidis, D. Optimisation theory is at the heart of any control process, where we seek to control the behaviour of a system through a set of actions. Linear control problems have been extensively studied, and optimal control laws have been identified. But the world around us is highly non-linear and unpredictable. For these dynamic systems, which don’t possess the nice mathematical properties of the linear counterpart, the classic control theory breaks and other methods have to be employed. But nature thrives by optimising non-linear and over-complicated systems. Evolutionary Computing (EC) methods exploit nature’s way by imitating the evolution processand avoid to solve the control problem analytically.Reinforcement Learning (RL) from the other side regards the optimal control problem as a sequential one. In every discrete time step an action is applied. The transition of the system to a new state is accompanied by a sole numerical value, the “reward” that designate the quality of the control action. Even though the amount of feedback information is limited into a solereal number, the introduction of the Temporal Difference method made possible to have accurate predictions of the value-functions. This paved the way to optimise complex structures, like the Neural Networks, which are used to approximate the value functions.In this thesis we investigate the solution of continuous Reinforcement Learning control problems by EC methodologies. The accumulated reward of such problems throughout an episode suffices as information to formulate the required measure, fitness, in order to optimise a population of candidate solutions. Especially, we explore the limits of applicability of a specific branch of EC, that of Genetic Programming (GP). The evolving population in the GP case is comprisedfrom individuals, which are immediately translated to mathematical functions, which can serveas a control law.The major contribution of this thesis is the proposed unification of these disparate Artificial Intelligence paradigms. The provided information from the systems are exploited by a step by step basis from the RL part of the proposed scheme and by an episodic basis from GP. This makes possible to augment the function set of the GP scheme with adaptable Neural Networks. In the quest to achieve stable behaviour of the RL part of the system a modification of the Actor-Criticalgorithm has been implemented.Finally we successfully apply the GP method in multi-action control problems extending the spectrum of the problems that this method has been proved to solve. Also we investigated the capability of GP in relation to problems from the food industry. These type of problems exhibit also non-linearity and there is no definite model describing its behaviour. 2016 Effraimidis_Dimitros _thesis.pdf University of Westminster https://doi.org/10.34737/q0y82

Related outputs

Genetic programming as a solver to challenging reinforcement learning problems
Dracopoulos, D., Effraimidis, D. and Nichols, B.D. 2013. Genetic programming as a solver to challenging reinforcement learning problems. International Journal of Computer Research. 20 (3), pp. 351-379.

Genetic programming as a solver to challenging reinforcement learning problems
Dracopoulos, D., Effraimidis, D. and Nichols, B.D. 2013. Genetic programming as a solver to challenging reinforcement learning problems. in: Clary, T.S. (ed.) Horizons in computer science research Hauppauge, NY Nova Science Publishers.

Genetic programming for generalised helicopter hovering control
Dracopoulos, D. and Effraimidis, D. 2012. Genetic programming for generalised helicopter hovering control. in: Moraglio, A., Silva, S., Krawiec, K., Machado, P. and Cotta, C. (ed.) Genetic programming: proceedings of the 15th European conference, EUROGP 2012 Malaga, Spain Springer.