|Title||Modelling and tracking objects with a topology preserving self-organising neural network|
Human gestures form an integral part in our everyday communication. We use
gestures not only to reinforce meaning, but also to describe the shape of objects,
to play games, and to communicate in noisy environments. Vision systems that
exploit gestures are often limited by inaccuracies inherent in handcrafted models.
These models are generated from a collection of training examples which requires
segmentation and alignment. Segmentation in gesture recognition typically involves manual intervention, a time consuming process that is feasible only for a
limited set of gestures. Ideally gesture models should be automatically acquired
via a learning scheme that enables the acquisition of detailed behavioural knowledge only from topological and temporal observation.
The research described in this thesis is motivated by a desire to provide a framework for the unsupervised acquisition and tracking of gesture models. In any
learning framework, the initialisation of the shapes is very crucial. Hence, it would
be beneficial to have a robust model not prone to noise that can automatically correspond the set of shapes. In the first part of this thesis, we develop a framework
for building statistical 2D shape models by extracting, labelling and corresponding
landmark points using only topological relations derived from competitive hebbian learning. The method is based on the assumption that correspondences can
be addressed as an unsupervised classification problem where landmark points
are the cluster centres (nodes) in a high-dimensional vector space. The approach
is novel in that the network can be used in cases where the topological structure of
the input pattern is not known a priori thus no topology of fixed dimensionality is imposed onto the network.
In the second part, we propose an approach to minimise the user intervention
in the adaptation process, which requires to specify a priori the number of nodes
needed to represent an object, by utilising an automatic criterion for maximum
node growth. Furthermore, this model is used to represent motion in image sequences by initialising a suitable segmentation that separates the object of interest
from the background. The segmentation system takes into consideration some illumination tolerance, images as inputs from ordinary cameras and webcams, some
low to medium cluttered background avoiding extremely cluttered backgrounds,
and that the objects are at close range from the camera.
In the final part, we extend the framework for the automatic modelling and
unsupervised tracking of 2D hand gestures in a sequence of k frames. The aim
is to use the tracked frames as training examples in order to build the model and
maintain correspondences. To do that we add an active step to the Growing Neural Gas (GNG) network, which we call Active Growing Neural Gas (A-GNG) that
takes into consideration not only the geometrical position of the nodes, but also the
underlined local feature structure of the image, and the distance vector between
successive images. The quality of our model is measured through the calculation
of the topographic product. The topographic product is our topology preserving
measure which quantifies the neighbourhood preservation.
In our system we have applied specific restrictions in the velocity and the appearance of the gestures to simplify the difficulty of the motion analysis in the gesture representation. The proposed framework has been validated on applications
related to sign language. The work has great potential in Virtual Reality (VR) applications where the learning and the representation of gestures becomes natural
without the need of expensive wear cable sensors.