Tension in big data using machine learning: Analysis and applications

Huamao Wang, Yumei Yao and Said Salhi 2020. Tension in big data using machine learning: Analysis and applications. Technological Forecasting and Social Change. 158 120175. https://doi.org/10.1016/j.techfore.2020.120175

TitleTension in big data using machine learning: Analysis and applications
TypeJournal article
AuthorsHuamao Wang, Yumei Yao and Said Salhi
Abstract

The access of machine learning techniques in popular programming languages and the exponentially expanding big data from social media, news, surveys, and markets provide exciting challenges and invaluable opportunities for organizations and individuals to explore implicit information for decision making. Nevertheless, the users of machine learning usually find that these sophisticated techniques could incur a high level of tensions caused by the selection of the appropriate size of the training data set among other factors. In this paper, we provide a systematic way of resolving such tensions by examining practical examples of predicting popularity and sentiment of posts on Twitter and Facebook, blogs on Mashable, news on Google and Yahoo, the US house survey, and Bitcoin prices. Interesting results show that for the case of big data, using around 20% of the full sample often leads to a better prediction accuracy than opting for the full sample. Our conclusion is found to be consistent across a series of experiments. The managerial implication is that using more is not necessarily the best and users need to be cautious about such an important sensitivity as the simplistic approach may easily lead to inferior solutions with potentially detrimental consequences.

Article number120175
JournalTechnological Forecasting and Social Change
Journal citation158
ISSN0040-1625
Year2020
PublisherElsevier
Digital Object Identifier (DOI)https://doi.org/10.1016/j.techfore.2020.120175
Web address (URL)http://dx.doi.org/10.1016/j.techfore.2020.120175
Publication dates
PublishedSep 2020
Published online30 Jun 2020

Related outputs

Dynamics and performance of decentralized portfolios with size-induced fund flows
Wang, H., Yang, J. and Yao, Y. 2019. Dynamics and performance of decentralized portfolios with size-induced fund flows. Quantitative Finance. 19 (6), pp. 885-898. https://doi.org/10.1080/14697688.2018.1550262

Permalink - https://westminsterresearch.westminster.ac.uk/item/w3yqv/tension-in-big-data-using-machine-learning-analysis-and-applications


Share this

Usage statistics

41 total views
0 total downloads
These values cover views and downloads from WestminsterResearch and are for the period from September 2nd 2018, when this repository was created.