*Remember that when an interactice chart is cited on the post, by clicking on it the source code will be shown, In order to visualize it on the right way, download the file as html and open it with your browser.*
In part I it was explained that a first sight at the f1 scores -considering features as in a classic 3D space against a non-euclidean one- gave clues of the variable -price changes- to be moving in a non-euclidean hyperspace rather than the classic cartesian one. It was based on the f1 scores being far higher for the former than for the latter one i.e a far better fit under such conditions.
Therefore, it would explain why most of models trying to predict price changes in the stock exchange fail badly as they use classic cartesian features. Consequently, as not all readers are familiarized with what a non-euclidean hyperspace means, a brief explanation besides the basis of this theory will be explained. First, let's imagine we have no idea of this concept and we define n features in a classic space as shown below:
Under this assumption, when running the model explained in part I and using Sklearn Grid Search for finding the best parameters, the confussion matrix and f1 scores obtained with the classification report are as follows:
F1 scores from 0.16 up to 0.73 -class 0- with a half accuracy and wighted f1 score. Results are not optimistic so let's see how they change when assuming a non-euclidean reality. In such space, two parallel lines will either converge or diverge whether the surface has a positive or negative curvature respectively, better denoted as +k and -k in most literature. Actually, in other areas as physics, it is only used to describe the shape of the universe:
If a positive curvature is assumed, it means that features are translated to a new spherical like reality as shown below:
The f1 scores and classification under such conditions are:
On the other hand, if a negative curvature is assumed, the translation would be as follows:
And the f1 scores and classification under such conditions are:
As seen, the best fit is for the features under a +k surface with f1 scores from 0.4 to 0.9 with weighted accuracy and f1 scores of 0.64 and 0.62 respectively.
For details on the python program used access the post here.
Comments