A Modular Deep Learning Framework for Scene Understanding in Augmented Reality Applications : WestminsterResearch

Publication dates
Title	A Modular Deep Learning Framework for Scene Understanding in Augmented Reality Applications
Authors	Li, V.
	Villarini, B.
	Nebel, JC.
	Argyriou, V.
Type	Conference paper
Abstract	Taking as input natural images and videos augmented reality (AR) applications aim to enhance the real world with superimposed digital contents enabling interaction between the user and the environment. One important step in this process is automatic scene analysis and understanding that should be performed both in real time and with a good level of object recognition accuracy. In this work an end-to-end framework based on the combination of a Super Resolution network with a detection and recognition deep network has been proposed to increase performance and lower processing time. This novel approach has been evaluated on two different datasets: the popular COCO dataset whose real images are used for benchmarking many different computer vision tasks, and a generated dataset with synthetic images recreating a variety of environmental, lighting and acquisition conditions. The evaluation analysis is focused on small objects, which are more challenging to be correctly detected and recognised. The results show that the Average Precision is higher for smaller and low resolution objects for the proposed end-to-end approach in most of the selected conditions.
Keywords	Augmented Reality
	Object Detection
	Scene Analysis
	Scene Understanding
	Object Recognition
	Deep Learning
	Super-Resolution
	Feature Extraction
Year	2023
Conference	The IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology
Publisher	IEEE
Accepted author manuscript	IEEE_IAICTPaper_SceneUnderstanding.pdf File Access Level Open (open metadata and files)
Published	13 Jul 2023
Journal	IAICT 2023 Conference Proceedings
Digital Object Identifier (DOI)	https://doi.org/10.1109/IAICT59002.2023.10205667

Related outputs

Enhanced CATBraTS for Brain Tumour Semantic Segmentation
El Badaoui, R., Bonmati Coll, E., Psarrou, A., Asaturyan, H. and Villarini, B. 2025. Enhanced CATBraTS for Brain Tumour Semantic Segmentation. Journal of Imaging. 11 (1), p. 8. https://doi.org/10.3390/jimaging11010008

Evaluation of Environmental Conditions on Object Detection Using Oriented Bounding Boxes for AR Applications
Li, Vladislav, Villarini, Barbara, Nebel, Jean–Christophe, Lagkas, Thomas, Sarigiannidis, Panagiotis and Argyriou, Vasileios 2023. Evaluation of Environmental Conditions on Object Detection Using Oriented Bounding Boxes for AR Applications. 2023 19th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT). Pafos, Cyprus 19 - 21 Jun 2023 IEEE . https://doi.org/10.1109/dcoss-iot58021.2023.00058

Detection of Physical Adversarial Attacks on Traffic Signs for Autonomous Vehicles
Villarini, B., Radoglou-Grammatikis, P., Lagkas, T., Sarigiannidis, P. and Argyriou, V. 2023. Detection of Physical Adversarial Attacks on Traffic Signs for Autonomous Vehicles. 2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT). Bali, Indonesia 13 - 15 May 2023 IEEE . https://doi.org/10.1109/IAICT59002.2023.10205591

An AI-Assisted Skincare Routine Recommendation System in XR
Rajegowda, M.g., Spyridis, Y., Villarini, B. and Argyriou, V. 2023. An AI-Assisted Skincare Routine Recommendation System in XR. 2023 7th International Conference on Artificial Intelligence and Virtual Reality (AIVR2023). Kumamoto, Japan 23 May - 21 Jun 2023 Springer.

3D CATBraTS: Channel Attention Transformer for Brain Tumour Semantic Segmentation
El Badaoui, R., Bonmati Coll, E., Psarrou, A. and Villarini, B. 2023. 3D CATBraTS: Channel Attention Transformer for Brain Tumour Semantic Segmentation. 36th IEEE International Symposium on Computer-Based Medical Systems (IEEE CBMS2023). L'Aquila, Italy 24 May - 22 Jun 2023 IEEE . https://doi.org/10.1109/cbms58004.2023.00267

Intraclass Clustering-Based CNN Approach for Detection of Malignant Melanoma
Bandy, A.D., Spyridis, Y., Villarini, B. and Argyriou, V. 2023. Intraclass Clustering-Based CNN Approach for Detection of Malignant Melanoma. Sensors. 23 (2), p. 926. https://doi.org/10.3390/s23020926

AI Driven IoT Web-Based Application for Automatic Segmentation and Reconstruction of Abdominal Organs from Medical Images
Villarini, B. and Asaturyan, H. 2022. AI Driven IoT Web-Based Application for Automatic Segmentation and Reconstruction of Abdominal Organs from Medical Images. International Conference on Distributed Computing in Sensor Systems (DCOSS). Los Angeles, California 30 May - 01 Jul 2022 IEEE . https://doi.org/10.1109/DCOSS54816.2022.00045

Improving Automatic Renal Segmentation in Clinically Normal and Abnormal Paediatric DCE-MRI via Contrast Maximisation and Convolutional Networks for Computing Markers of Kidney Function.
Asaturyan, H., Villarini, Barbara, Sarao, Karen, Chow, Jeanne S, Afacan, Onur and Kurugol, Sila 2021. Improving Automatic Renal Segmentation in Clinically Normal and Abnormal Paediatric DCE-MRI via Contrast Maximisation and Convolutional Networks for Computing Markers of Kidney Function. Sensors. 21 (23) 7942. https://doi.org/10.3390/s21237942

3D Deep Learning for Anatomical Structure Segmentation in Multiple Imaging Modalities
Villarini, B., Asaturyan, H., Kurugol, S., Afacan, O., Bell, J.D. and Thomas, E.L. 2021. 3D Deep Learning for Anatomical Structure Segmentation in Multiple Imaging Modalities. O'Conner, L. (ed.) 34th IEEE CBMS International Symposium on Computer-Based Medical Systems. Online Event 07 - 09 Jun 2021 IEEE . https://doi.org/10.1109/CBMS52027.2021.00066

A Survey of Alzheimer’s Disease Early Diagnosis Methods for Cognitive Assessment
Fernández Montenegro, Juan Manuel, Villarini, B., Angelopoulou, A., Kapetanios, E., Garcia-Rodriguez, J. and Argyriou, Vasileios 2020. A Survey of Alzheimer’s Disease Early Diagnosis Methods for Cognitive Assessment. Sensors. 20 (24) e7292. https://doi.org/10.3390/s20247292

A Framework for Automatic Morphological Feature Extraction and Analysis of Abdominal Organs in MRI Volumes
Asaturyan, H., Thomas, E.L., Bell, J.D. and Villarini, B. 2019. A Framework for Automatic Morphological Feature Extraction and Analysis of Abdominal Organs in MRI Volumes. Journal of Medical Systems. 43 334. https://doi.org/10.1007/s10916-019-1474-3

Advancing Pancreas Segmentation in Multi-protocol MRI Volumes using Hausdorff-Sine Loss Function
Asaturyan, H., Thomas, E.L., Fitzpatrick, J., Bell, J.D. and Villarini, B. 2019. Advancing Pancreas Segmentation in Multi-protocol MRI Volumes using Hausdorff-Sine Loss Function. 10th International Workshop on Machine Learning in Medical Imaging (MLMI 2019) in conjunction with MICCAI 2019. Shenzen, China 13 Oct 2019 Springer. https://doi.org/10.1007/978-3-030-32692-0_4

Morphological and multi-level geometrical descriptor analysis in CT and MRI volumes for automatic pancreas segmentation
Asaturyan, H., Gligorievski, A. and Villarini, B. 2019. Morphological and multi-level geometrical descriptor analysis in CT and MRI volumes for automatic pancreas segmentation. Computerized Medical Imaging and Graphics. 75, pp. 1-13. https://doi.org/10.1016/j.compmedimag.2019.04.004

The SmartTarget BIOPSY trial: A prospective, within-person randomised, blinded trial comparing the accuracy of visual-registration and MRI/ultrasound image-fusion targeted biopsies for prostate cancer risk stratification
Hamid, S., Donaldson, I.A., Hu, Y., Rodell, R., Villarini, B., Bonmati, E., Tranter, P., Punwani, S., Side, H.S., Willis, S., van der Meulen, J., Hawkes, D., Mccarran, N., Potyka, I., Williams, N.W., Brew-Graves, C., Freeman, A., Moore, C.M., Barratt, D., Emberton, M. and Ahmed, H.U. 2019. The SmartTarget BIOPSY trial: A prospective, within-person randomised, blinded trial comparing the accuracy of visual-registration and MRI/ultrasound image-fusion targeted biopsies for prostate cancer risk stratification. European Urology. 75 (5), p. 733–740. https://doi.org/10.1016/j.eururo.2018.08.007

Hierarchical Framework for Automatic Pancreas Segmentation in MRI Using Continuous Max-flow and Min-Cuts Approach
Asaturyan, H. and Villarini, B. 2018. Hierarchical Framework for Automatic Pancreas Segmentation in MRI Using Continuous Max-flow and Min-Cuts Approach. ICIAR 2018 International Conference Image Analysis and Recognition. Póvoa de Varzim, Portugal 27 - 29 Jun 2018 Springer. https://doi.org/10.1007/978-3-319-93000-8_64

Technical Note: Error metrics for estimating the accuracy of needle/instrument placement during transperineal MR/US-guided prostate interventions
Bonmati, E., Hu, Y., Villarini, B., Rodell, R., Martin, P., Han, L., Donaldson, I., Ahmed, H.U., Moore, C.M., Emberton, M. and Barratt, D.C. 2018. Technical Note: Error metrics for estimating the accuracy of needle/instrument placement during transperineal MR/US-guided prostate interventions. Medical Physics. 45 (4), pp. 1408-1414. https://doi.org/10.1002/mp.12814

MP33-20 The SmartTarget Biopsy Trial: a Prospective Paired Blinded Trial with Randomisation to Compare Visual-Estimation and Image-Fusion Targeted Prostate Biopsies
Donaldson, I., Hamid, S., Barratt, D., Hu, Y., Rodell, R., Villarini, B., Bonmati, E., Martin, P., Hawkes, D., Mccarran, N., Potyka, I., Williams, N., Bre-Graves, C., Moore, C., Emberson, M. and Ahmed, H. 2017. MP33-20 The SmartTarget Biopsy Trial: a Prospective Paired Blinded Trial with Randomisation to Compare Visual-Estimation and Image-Fusion Targeted Prostate Biopsies. The Journal of Urology. 197 (4), p. e425. https://doi.org/10.1016/j.juro.2017.02.1016

A Framework for Morphological Feature Extraction of Organs from MR Images for Detection and Classification of Abnormalities
Villarini, B., Asaturyan, H., Thomas, E.L., Mould, R. and Bell, J.D. 2017. A Framework for Morphological Feature Extraction of Organs from MR Images for Detection and Classification of Abnormalities. Proceedings of the 30th IEEE International Symposium on Computer-Based Medical Systems (CBMS’17). Thessaloniki, Greece 22 - 24 Jun 2017 IEEE . https://doi.org/10.1109/CBMS.2017.49

Cognitive behaviour analysis based on facial information using depth sensors
Montenegro, J.F., Villarini, B., Gkelias, A. and Argyriou, V. 2016. Cognitive behaviour analysis based on facial information using depth sensors. Wannous, H., Pala, P., Daoudi, M. and Flórez-Revuelta, F. (ed.) ICPR Workshop on Understanding Human Activities through 3D Sensors (UHA3DS 2016). Cancun, Mexico 04 - 08 Dec 2016 Springer. https://doi.org/10.1007/978-3-319-91863-1

Photometric Stereo for 3D Face Reconstruction Using Non Linear Illumination Models
Villarini, B., Gkelias, A. and Argyriou, V. 2016. Photometric Stereo for 3D Face Reconstruction Using Non Linear Illumination Models. ICPR Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer Interaction. Cancun, Mexico 04 Dec 2016 - 08 Jun 2017 Springer. https://doi.org/10.1007/978-3-319-59259-6_12

Validation of the needle targeting accuracy of a MRI/TRUS- image-guided system for transperineal prostate cancer biopsy
Bonmati, E., Hu, Y., Rodell, R., Villarini, B., Martin, P., Han, L., Donaldson, I., Ahmed, H.U., Moore, C.M., Emberton, M. and Barratt, D.C. 2015. Validation of the needle targeting accuracy of a MRI/TRUS- image-guided system for transperineal prostate cancer biopsy. CARS-Computer Assisted Radiology and Surgery, 29th International Congress and Exhibition. Barcelona, Spain 24 Jun 2015 Springer. https://doi.org/10.1007/s11548-015-1213-2

Image, video and 3D data registration: medical, satellite and video processing applications with quality metrics
Argyriou, V., Del Rincon, J.M., Villarini, B. and Roche, A. 2015. Image, video and 3D data registration: medical, satellite and video processing applications with quality metrics. Oxford Wiley.

A sparse representation method for determining the optimal illumination directions in Photometric Stereo
Argyriou, V., Zafeiriou, S., Villarini, B. and Petrou, M. 2013. A sparse representation method for determining the optimal illumination directions in Photometric Stereo. Signal Processing. 93 (11), pp. 3027-3038. https://doi.org/10.1016/j.sigpro.2013.04.026

An optimal method for searching UEP profiles in wireless JPEG 2000 video transmission
Baruffa, G., Frescura, F., Micanti, P. and Villarini, B. 2012. An optimal method for searching UEP profiles in wireless JPEG 2000 video transmission. ICIP - International Conference on Image Processing. Orlando, FL 30 Sep 2012 IEEE . https://doi.org/10.1109/ICIP.2012.6467192

A reduced-reference perceptual image and video quality metric based on edge preservation
Martini, M.G., Villarini, B. and Fiorucci, F. 2012. A reduced-reference perceptual image and video quality metric based on edge preservation. EURASIP Journal on Advances in Signal Processing. 2012 (66) 66. https://doi.org/10.1186/1687-6180-2012-66

Image quality assessment based on edge preservation
Martini, M.G., Hewage, C. and Villarini, B. 2012. Image quality assessment based on edge preservation. Signal Processing: Image Communication. 27 (8), pp. 875-882. https://doi.org/10.1016/j.image.2012.01.012

Reduced-Reference Image Quality Assessment Based on Edge Preservation
Martini, M.G., Villarini, B. and Fiorucci, F. 2011. Reduced-Reference Image Quality Assessment Based on Edge Preservation. 7th International ICST Mobile Multimedia Communications Conference. Cagliari, Italy 05 Sep 2011 Springer. https://doi.org/10.1007/978-3-642-30419-4_3

A reprogrammable computing platform for JPEG 2000 and H.264 SHD video coding
Baruffa, G., Fiorucci, F., Frescura, F., Micanti, P., Verducci, L. and Villarini, B. 2010. A reprogrammable computing platform for JPEG 2000 and H.264 SHD video coding. 8th IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia). Scottsdale, AZ 28 Oct 2010 IEEE . https://doi.org/10.1109/ESTMED.2010.5666990

Permalink - https://westminsterresearch.westminster.ac.uk/item/w347y/a-modular-deep-learning-framework-for-scene-understanding-in-augmented-reality-applications

A Modular Deep Learning Framework for Scene Understanding in Augmented Reality Applications

Related outputs

Share this

Usage statistics

Export as