# Data is Nor 2D Neither 3D but 4D... or Higher

In __prior posts__ it has been analyzed the data of several ** price changes** for securities in Europe, USA and Latin America. According to it, when one considers the

**as**

*variable*

*non euclidean*__Sklearn__libraries as

__SVM__or

__Decision Trees__give outputs that are

**with higher**

*more reliable***and**

*f1 scores***. Therefore, it means that the higher dimensionality of the data makes it incomprehensible as our 3D reality limits us.**

*recall*However, what one can do is to get a ** projection** of it on a

**or**

*surface***that can be analyzed and understood. For practicality, as the Sklearn library is the most widely used, a surface is chosen as the instance**

*volume***is the best suited in terms of straight forwardness. Consequently, as the variable has**

__Decision Boundary Display__**features: five classes and five probability of ocurrence of each class (more details**

*ten***), two has to be chosen to graph.**

__here__Therefore, which features to chose to graph? The decision can be * suggested* but there is no actual procedure for that, reason why after testing several hundreds of pairs of them the conclusion was the obvious one: the fatures that represent the

*of*

**probability***in the*

**ending up***two*

**most important***. The reader that has no access to the source code may wonder or excessively trying to visualize what is described so far, so:*

**classes**The matrix ** 'X'** of

**and 10 features comes from the**

*'n' instances***and**

*'distances'***matrices. The former is calculated with a novel method that can be checked in detail in the link at the end of the second paragraph while the**

*'freq values'***comes from the**

*later***of each security. As there are**

*historical frequency distribution*

*'m'***, it is multiplied 's' times such that it equals 'n' -in the future frequency tables that change with each projection may be tested-.**

*securities*Additionally, as data is ** not balanced**:

it should be fixed, otherwise results won't be ** consistent**. Such target is reached with also a novel method that can be checked in the source code. The final outcome is as below:

And after finding the best parameters through __Grid Search__:

The boundaries displayed for Euclidean, Manhattan, +k, and -k are:

The decision whether which method is ** best-suited** is relative e.g. to explain this process the current post considers that there are

**features that are crucial: classes**

*two***and**

*'0'***. The former because it represents the**

*'3'***and the later one because as it is historically the**

*biggest weekly loss***one; thus, if a prediction model can't foresee it accurately, it should be discarded.**

*most frequent*Following this criteria, the Euclidean one is discarded due to a poor class '3' f1 score, as happened with Manhattan and -k; consequently, +k is best suited. One strong argument to support the fact that this data is not Euclidean is the big area corresponding to the class '0' (purple). Finally, it is difficult to deduct more outputs from a 2D projection.