Ngram and bayesian classification of documents for topic and authorship

Clement, R. and Sharp, D. 2003. Ngram and bayesian classification of documents for topic and authorship. Literary and Linguistic Computing. 18 (4), pp. 423-447.

TitleNgram and bayesian classification of documents for topic and authorship
AuthorsClement, R. and Sharp, D.
Abstract

Large, real world, data sets have been investigated in the context of Authorship Attribution of real world documents. Ngram measures can be used to accurately assign authorship for long documents such as novels. A number of 5 (authors) x 5 (movies) arrays of movie reviews were acquired from the Internet Movie Database. Both ngram and naive Bayes classifiers were used to classify along both the authorship and topic (movie) axes. Both approaches yielded similar results, and authorship was as accurately detected, or more accurately detected, than topic. Part of speech tagging and function-word lists were used to investigate the influence of structure on classification tasks on documents with meaning removed but grammatical structure intact.

JournalLiterary and Linguistic Computing
Journal citation18 (4), pp. 423-447
ISSN0268-1145
YearNov 2003
Digital Object Identifier (DOI)doi:10.1093/llc/18.4.423
Publication dates
PublishedNov 2003

Related outputs

SMS communication and announcement classification in managed learning environments
Clement, R., Baldwin, M., Vassell, C. and Amin, N. 2005. SMS communication and announcement classification in managed learning environments. First International Workshop on Web Personalisation, Recommender Systems and Intelligent User Interfaces, part of the Second International Conference on E-Business and Telecommunications Networks. Reading, UK 03-04 Oct 2005

Visualising speciation in models of cichlid fish
Clement, R. 2003. Visualising speciation in models of cichlid fish. in: 17th European Simulation Multiconference 9-11 June 2003 Friedrich-Alexander-Universita Erlangen-Nurnberg. pp. 344-348

Plausible roles for social learning in the speciation and evolution of cichlid fish
Clement, R. 2003. Plausible roles for social learning in the speciation and evolution of cichlid fish. in: Dautenham, K. and Nehaniv, C. (ed.) Proceedings of the AISB'03: 2nd International Symposium on Imitation Animals and Artefacts, April 7-11, Aberystwyth Massachusetts, USA MIT Press.

Plausible roles for social and individual learning in the speciation and evolution of cichlid fish
Clement, R. 2003. Plausible roles for social and individual learning in the speciation and evolution of cichlid fish. Interdisciplinary Journal of Artificial Intelligence and the Simulation of Behaviour. 1 (4), pp. 319-334.

Multi-agent simulations of evolution and speciation of cichlid fish
Clement, R. 2003. Multi-agent simulations of evolution and speciation of cichlid fish. in: Verbraeck, A. and Hlupic, V. (ed.) ESS'2003, Proceedings 15th European Simulation Symposium 2003: Simulation in Industry Germany SCS European Publishing House.

Permalink - https://westminsterresearch.westminster.ac.uk/item/93512/ngram-and-bayesian-classification-of-documents-for-topic-and-authorship


Share this
Tweet
Email