An Evaluation Framework for Automated Audio Description

Pacurar, Cristian 2025. An Evaluation Framework for Automated Audio Description. MPhil thesis University of Westminster Humanities https://doi.org/10.34737/wzvw9

Title	An Evaluation Framework for Automated Audio Description
Type	MPhil thesis
Authors	Pacurar, Cristian
Abstract	The United Nations Convention on the Rights of Persons with Disabilities (CRPD, 2006) stipulates that all individuals have the right to access information and communicate through means of their choice. This underscores the fundamental right to be informed and access information. However, information is not always accessible for people with disabilities, particularly those with visual impairments. With recent advancements in AI models, such as GPT-4 with Vision by OpenAI and the Pegasus-1 model by Twelvelabs, the automation of the audio description process is becoming increasingly feasible. The initial goal of this research was to create an automated system capable of generating audio description tracks automatically, thereby increasing accessibility for blind or partially sighted persons. The premise of the research was that the human audio description process could be split into smaller steps, each of which could be automated and then chained together. Currently, there is no established framework for assessing the efficacy of algorithms in automating the audio description process. Although various algorithms can replace human audio describers in certain steps, there are no key performance indicators (KPIs) for analysis and comparison. Furthermore, there is no standardised method for evaluating and comparing multiple algorithms performing specific audio description tasks, which hinders objective decision-making. To address this gap, the initial step involved analysing the stages of the human audio description process and breaking them down into self-contained actions suitable for automation. This led to the conceptualisation of an automated audio description system designed to replicate the entire human process. To demonstrate the practicality of the evaluation framework, a partially automated system was developed, focusing on automating the creation of the audio description script. This system serves as a proof of concept for the usability and effectiveness of the proposed framework. Nevertheless, due to the absence of globally accepted guidelines, multiple guidelines were compared and synthesised to create a unified set of KPIs which could then be used in the evaluation framework.
Year	2025
File	Full Thesis Final.pdf File Access Level Open (open metadata and files)
Project	An Evaluation Framework for Automated Audio Description
Publisher	University of Westminster
Publication dates
Published	07 Oct 2024
Digital Object Identifier (DOI)	https://doi.org/10.34737/wzvw9

Title

Type

MPhil thesis

Authors

Pacurar, Cristian

Abstract

The United Nations Convention on the Rights of Persons with Disabilities (CRPD, 2006) stipulates that all individuals have the right to access information and communicate through means of their choice. This underscores the fundamental right to be informed and access information. However, information is not always accessible for people with disabilities, particularly those with visual impairments. With recent advancements in AI models, such as GPT-4 with Vision by OpenAI and the Pegasus-1 model by Twelvelabs, the automation of the audio description process is becoming increasingly feasible.

The initial goal of this research was to create an automated system capable of generating audio description tracks automatically, thereby increasing accessibility for blind or partially sighted persons. The premise of the research was that the human audio description process could be split into smaller steps, each of which could be automated and then chained together.

Currently, there is no established framework for assessing the efficacy of algorithms in automating the audio description process. Although various algorithms can replace human audio describers in certain steps, there are no key performance indicators (KPIs) for analysis and comparison. Furthermore, there is no standardised method for evaluating and comparing multiple algorithms performing specific audio description tasks, which hinders objective decision-making.

To address this gap, the initial step involved analysing the stages of the human audio description process and breaking them down into self-contained actions suitable for automation. This led to the conceptualisation of an automated audio description system designed to replicate the entire human process.

To demonstrate the practicality of the evaluation framework, a partially automated system was developed, focusing on automating the creation of the audio description script. This system serves as a proof of concept for the usability and effectiveness of the proposed framework. Nevertheless, due to the absence of globally accepted guidelines, multiple guidelines were compared and synthesised to create a unified set of KPIs which could then be used in the evaluation framework.

Year

2025

File

Full Thesis Final.pdf

File Access Level

Open (open metadata and files)

Project

An Evaluation Framework for Automated Audio Description

Publisher

University of Westminster

Publication dates

Published

07 Oct 2024

Digital Object Identifier (DOI)

https://doi.org/10.34737/wzvw9

An Evaluation Framework for Automated Audio Description

Share this

Usage statistics

Export as