3/26/2023 0 Comments Synergy of serra release date![]() PCA does not actually “choose” to use Euclidean distance-in a beautiful bit of math, it arises naturally from PCA’s goal of maximizing the variance explained in the data.In reality, recent work has shown that the algorithms are essentially equivalent and equally poor at explaining data in an interpretable way. For some niche data-science drama, the creators of UMAP claim that their algorithm is better than t-SNE at preserving global structure. The other popular method, t-SNE, cannot handle such large data and requires you to first reduce the dimensionality to 10-20 dimensions with another algorithm like PCA first. It is incredibly efficient and can run directly on the original data matrix (for the Commander Map, ~1.2M x 22k). UMAP preserves nonlinear relationships, which it calls “manifolds”, between similar data points.Sometimes it’s next to the MTGO Vintage Cube’s cluster, and sometimes it’s halfway across the map. Different iterations of nonlinear algorithms, for example, will place the Degenerate Micro Cube cluster in different positions. This means that the location of “islands” in the maps are essentially arbitrary.The two most popular methods, UMAP and t-SNE, work by establishing connections between similar points in higher-dimensional space (“local structure”) and then trying to maintain those connections in a lower-dimensional representation. Real nonlinear methods do not work like this.As it turns out, the Commander and Cube ecosystems are full of complicated relationships. Two variables can be uncorrelated but still dependent if their relationship is nonlinear-as a result, PCA is terrible at capturing complicated relationships in a dataset. Somewhat confusingly, this does not mean independent. To explain a bit more: PCA identifies new dimensions that are uncorrelated (or “orthogonal”).Using only 0’s and 1’s for weights also makes the math much easier. Unsurprisingly, commanders influence deck similarity a great deal anyway by influencing which maindeck cards are played- Gishath decks, for example, almost always play Regisaur Alpha. A commander is not always critical to the deck’s function, while the presence of basics can reflect a deck’s budget or theme. In the Commander dataset, one could alternatively assign more weight to commanders or less weight to basic lands.Ignoring duplicates doesn’t treat cards like Rat Colony correctly, but the map ends up grouping decks containing cards like these together anyway. For reasons that I’ll explain later, duplicate cards are ignored (see footnote 12), though for the Commander dataset we receive deduplicated data from the start. No such restriction is possible for Commander, as there are legitimate decklists with a Commander and 99 basic lands. For cubes, I filtered out lists that had fewer than 50 unique cards, as these cubes tended to be unfinished (my sincerest apologies to the Hidden Gibbons Cube). In preparing the data, I removed duplicate lists from each dataset.If there were a fourth list with green, white, and blue cards, a third dimension would be needed to explain all the information. So the “third” dimension gives us no additional information. Astute readers might wonder-how can two dimensions fully explain presence or absence of three colors? Does that not require three dimensions? Only two dimensions are needed because for these lists, the presence of blue and green automatically implies the absence of white.Intuitively, describing a point’s position along that line allows you to distinguish it from other points. In this context, I am referring to the variance in the data, or essentially the differences between points. ![]() The term “information” can mean many things.Dimensionality reduction works much better when correlations exist because the effective dimensionality of the data is lower. You can guess someone’s “thickness” by knowing their height and width because these metrics are correlated. We can visualize three dimensions, so why not use three? The answer is that humans are terrible at interpreting three-dimensional plots. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |