Meowmeow Posted April 20, 2020 Report Posted April 20, 2020 I have 40 columns of data with 10,000 rows. I want to reduce the dimensionality to 2 or 3 dimensions, such that the clustering is not by Euclidean distance but by the probability distribution of the cluster. How to do this? R or Python is ok. Quote
Scada Posted April 20, 2020 Report Posted April 20, 2020 Ma roomate gadini adigi vadi backup number ivvamantava ? Quote
ring_master Posted April 20, 2020 Report Posted April 20, 2020 2 minutes ago, Meowmeow said: I have 40 columns of data with 10,000 rows. I want to reduce the dimensionality to 2 or 3 dimensions, such that the clustering is not by Euclidean distance but by the probability distribution of the cluster. How to do this? R or Python is ok. PCA (Principal Component Analysis) Quote
Meowmeow Posted April 20, 2020 Author Report Posted April 20, 2020 Just now, ring_master said: PCA (Principal Component Analysis) That's Euclidean distance no? Quote
Meowmeow Posted April 20, 2020 Author Report Posted April 20, 2020 1 minute ago, Meowmeow said: That's Euclidean distance no? Nevermind, that's not even Euclidean distance, but based on covariance. I need to probability distribution of the cluster. Quote
ring_master Posted April 20, 2020 Report Posted April 20, 2020 1 minute ago, Meowmeow said: That's Euclidean distance no? S. Are all the columns numerical or there are categorical columns in it? Quote
Meowmeow Posted April 20, 2020 Author Report Posted April 20, 2020 Just now, ring_master said: S. Are all the columns numerical or there are categorical columns in it? All are just numbers. Quote
kathanayaka Posted April 20, 2020 Report Posted April 20, 2020 2 minutes ago, Meowmeow said: That's Euclidean distance no? Try PCA with n_components=2 but depends on R2 how much of data is explained eucledian distance is used in KNN Quote
ring_master Posted April 20, 2020 Report Posted April 20, 2020 Just now, Meowmeow said: Nevermind, that's not even Euclidean distance, but based on covariance. I need to probability distribution of the cluster. Clustering and dimensionality reduction are two different things.. Do you want to reduce 40 columns into 2 or 3 columns or you want to assign each row in the 10K rows to a cluster number that you specify? Quote
Meowmeow Posted April 20, 2020 Author Report Posted April 20, 2020 Just now, kathanayaka said: Try PCA with n_components=2 but depends on R2 how much of data is explained eucledian distance is used in KNN PCA is only for the variance in the data. tSNE is for Euclidian distance between clusters. I need something for the probability distribution of the cluster. Quote
kathanayaka Posted April 20, 2020 Report Posted April 20, 2020 your wording is not correct I want to reduce the dimensionality to 2 or 3 dimensions -- Does this mean you want to divide the dataset into clusters like 2 or 3? Quote
Meowmeow Posted April 20, 2020 Author Report Posted April 20, 2020 1 minute ago, ring_master said: Clustering and dimensionality reduction are two different things.. Do you want to reduce 40 columns into 2 or 3 columns or you want to assign each row in the 10K rows to a cluster number that you specify? I am doing clustering, the distance between clusters denoting the probability distribution of finding the cluster. Quote
Meowmeow Posted April 20, 2020 Author Report Posted April 20, 2020 1 minute ago, kathanayaka said: your wording is not correct I want to reduce the dimensionality to 2 or 3 dimensions -- Does this mean you want to divide the dataset into clusters like 2 or 3? The number of clusters should depend on how different the data is (ideally 40 columns would become 7-8 clusters) Atleast that's my solution to the problem based on Euclidian distance. Quote
ring_master Posted April 20, 2020 Report Posted April 20, 2020 Just now, Meowmeow said: I am doing clustering, the distance between clusters denoting the probability distribution of finding the cluster. K- means vaadu but K-means assigns cluster to it's members based on euclidean distance between clusters Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.