Tension in big data using machine learning: Analysis and applications

Huamao Wang, Yumei Yao and Said Salhi 2020. Tension in big data using machine learning: Analysis and applications. Technological Forecasting and Social Change. 158 120175. https://doi.org/10.1016/j.techfore.2020.120175

Title	Tension in big data using machine learning: Analysis and applications
Type	Journal article
Authors	Huamao Wang, Yumei Yao and Said Salhi
Abstract	The access of machine learning techniques in popular programming languages and the exponentially expanding big data from social media, news, surveys, and markets provide exciting challenges and invaluable opportunities for organizations and individuals to explore implicit information for decision making. Nevertheless, the users of machine learning usually find that these sophisticated techniques could incur a high level of tensions caused by the selection of the appropriate size of the training data set among other factors. In this paper, we provide a systematic way of resolving such tensions by examining practical examples of predicting popularity and sentiment of posts on Twitter and Facebook, blogs on Mashable, news on Google and Yahoo, the US house survey, and Bitcoin prices. Interesting results show that for the case of big data, using around 20% of the full sample often leads to a better prediction accuracy than opting for the full sample. Our conclusion is found to be consistent across a series of experiments. The managerial implication is that using more is not necessarily the best and users need to be cautious about such an important sensitivity as the simplistic approach may easily lead to inferior solutions with potentially detrimental consequences.
Article number	120175
Journal	Technological Forecasting and Social Change
Journal citation	158
ISSN	0040-1625
Year	2020
Publisher	Elsevier
Digital Object Identifier (DOI)	https://doi.org/10.1016/j.techfore.2020.120175
Web address (URL)	http://dx.doi.org/10.1016/j.techfore.2020.120175
Publication dates
Published	Sep 2020
Published online	30 Jun 2020

Title

Type

Journal article

Authors

Huamao Wang, Yumei Yao and Said Salhi

Abstract

The access of machine learning techniques in popular programming languages and the exponentially expanding big data from social media, news, surveys, and markets provide exciting challenges and invaluable opportunities for organizations and individuals to explore implicit information for decision making. Nevertheless, the users of machine learning usually find that these sophisticated techniques could incur a high level of tensions caused by the selection of the appropriate size of the training data set among other factors. In this paper, we provide a systematic way of resolving such tensions by examining practical examples of predicting popularity and sentiment of posts on Twitter and Facebook, blogs on Mashable, news on Google and Yahoo, the US house survey, and Bitcoin prices. Interesting results show that for the case of big data, using around 20% of the full sample often leads to a better prediction accuracy than opting for the full sample. Our conclusion is found to be consistent across a series of experiments. The managerial implication is that using more is not necessarily the best and users need to be cautious about such an important sensitivity as the simplistic approach may easily lead to inferior solutions with potentially detrimental consequences.

Article number

120175

Journal

Technological Forecasting and Social Change

Journal citation

158

ISSN

0040-1625

Year

2020

Publisher

Elsevier

Digital Object Identifier (DOI)

https://doi.org/10.1016/j.techfore.2020.120175

Web address (URL)

http://dx.doi.org/10.1016/j.techfore.2020.120175

Publication dates

Published

Sep 2020

Published online

30 Jun 2020

Tension in big data using machine learning: Analysis and applications

Related outputs

Share this

Usage statistics

Export as