Ngram and bayesian classification of documents for topic and authorship : WestminsterResearch

Publication dates
Title	Ngram and bayesian classification of documents for topic and authorship
Authors	Clement, R. and Sharp, D.
Abstract	Large, real world, data sets have been investigated in the context of Authorship Attribution of real world documents. Ngram measures can be used to accurately assign authorship for long documents such as novels. A number of 5 (authors) x 5 (movies) arrays of movie reviews were acquired from the Internet Movie Database. Both ngram and naive Bayes classifiers were used to classify along both the authorship and topic (movie) axes. Both approaches yielded similar results, and authorship was as accurately detected, or more accurately detected, than topic. Part of speech tagging and function-word lists were used to investigate the influence of structure on classification tasks on documents with meaning removed but grammatical structure intact.
Journal	Literary and Linguistic Computing
Journal citation	18 (4), pp. 423-447
ISSN	0268-1145
Year	Nov 2003
Digital Object Identifier (DOI)	https://doi.org/10.1093/llc/18.4.423
Published	Nov 2003

Related outputs

SMS communication and announcement classification in managed learning environments
Clement, R., Baldwin, M., Vassell, C. and Amin, N. 2005. SMS communication and announcement classification in managed learning environments. First International Workshop on Web Personalisation, Recommender Systems and Intelligent User Interfaces, part of the Second International Conference on E-Business and Telecommunications Networks. Reading, UK 03-04 Oct 2005

Visualising speciation in models of cichlid fish
Clement, R. 2003. Visualising speciation in models of cichlid fish. in: 17th European Simulation Multiconference 9-11 June 2003 Friedrich-Alexander-Universita Erlangen-Nurnberg. pp. 344-348

Plausible roles for social learning in the speciation and evolution of cichlid fish
Clement, R. 2003. Plausible roles for social learning in the speciation and evolution of cichlid fish. in: Dautenham, K. and Nehaniv, C. (ed.) Proceedings of the AISB'03: 2nd International Symposium on Imitation Animals and Artefacts, April 7-11, Aberystwyth Massachusetts, USA MIT Press.

Plausible roles for social and individual learning in the speciation and evolution of cichlid fish
Clement, R. 2003. Plausible roles for social and individual learning in the speciation and evolution of cichlid fish. Interdisciplinary Journal of Artificial Intelligence and the Simulation of Behaviour. 1 (4), pp. 319-334.

Multi-agent simulations of evolution and speciation of cichlid fish
Clement, R. 2003. Multi-agent simulations of evolution and speciation of cichlid fish. in: Verbraeck, A. and Hlupic, V. (ed.) ESS'2003, Proceedings 15th European Simulation Symposium 2003: Simulation in Industry Germany SCS European Publishing House.

Permalink - https://westminsterresearch.westminster.ac.uk/item/93512/ngram-and-bayesian-classification-of-documents-for-topic-and-authorship

Ngram and bayesian classification of documents for topic and authorship

Related outputs

Share this

Usage statistics

Export as