Abstract | High accuracy forecasts are essential to financial risk management, where machine learning algorithms are frequently employed. We derive a new theoretical bound on the sample complexity for PAC learning in the presence of noise, and does not require specification of the hypothesis set |H|. Consequently, we demonstrate that for realistic financial applications (where |H|is typically infinite) that big data is necessary, contrary to prior theoretical conclusions. Secondly, we show that noise (which is a non-trivial component of big data) has a dominating impact on the data size required for PAC learning. Consequently, contrary to current big data trends, we argue that high quality data is more important than large volumes of data. Thirdly, we demonstrate that the level of algorithmic sophistication (specifically the Vapnik Chervonenkis dimension) needs to be traded-off against data requirements to ensure optimal algorithmic performance. Finally, our new Theorem can be applied to a wider range of machine learning algorithms, as it does not impose finite |H| requirements. This paper will be of interest to researchers and industry specialists who are interested in machine learning in financial applications. |
---|