K-Means clustering - What to do if a cluster has 0 elements?
Clash Royale CLAN TAG#URR8PPP
$begingroup$
I'm writing code for k-means clustering. I have around 100000 vectors of size 128x1 (SIFT descriptors). I'm trying different initialization methods such as Forgy and Random Partition.
What if suppose, no vectors are classified to a cluster (in one of the iterations)? How to calculate centroid for that cluster?
clustering k-means
$endgroup$
add a comment |
$begingroup$
I'm writing code for k-means clustering. I have around 100000 vectors of size 128x1 (SIFT descriptors). I'm trying different initialization methods such as Forgy and Random Partition.
What if suppose, no vectors are classified to a cluster (in one of the iterations)? How to calculate centroid for that cluster?
clustering k-means
$endgroup$
add a comment |
$begingroup$
I'm writing code for k-means clustering. I have around 100000 vectors of size 128x1 (SIFT descriptors). I'm trying different initialization methods such as Forgy and Random Partition.
What if suppose, no vectors are classified to a cluster (in one of the iterations)? How to calculate centroid for that cluster?
clustering k-means
$endgroup$
I'm writing code for k-means clustering. I have around 100000 vectors of size 128x1 (SIFT descriptors). I'm trying different initialization methods such as Forgy and Random Partition.
What if suppose, no vectors are classified to a cluster (in one of the iterations)? How to calculate centroid for that cluster?
clustering k-means
clustering k-means
asked Feb 1 at 8:17
Nagabhushan S NNagabhushan S N
1326
1326
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
This is mostly an issue with really bad initialization (random vector generation as well as random labeling are stupid, don't use it - choose k points wth sampling, or k-means++) and with data where k-means doesn't work well at all. So if this happens, you know the results won't be good!
Either way, the standard and straightforward solution is simple: use the previous mean if a cluster becomes empty. It could be assigned points later again. And if it doesn't, well, then the cluster is empty. No surprises here, no infinite loops, convergence issues, etc.
$endgroup$
add a comment |
$begingroup$
If you get an empty cluster, it has no center of mass. You can simply ignore this cluster (set k=k-1
for next iteration), or repeat the k-means run from a new initialization. You can also choose to place a random data point into that cluster and carry on with the algorithm if you must have this specific number of K clusters.
If it keeps happening, there is a good chance that your K parameter is too high.
You should iterate over a few different values for k and pick the best.
Here is how an empty cluster might happen:
- Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose
k = 3
.
- Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.
- Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.
The images were taken from the following link: K-Means Empty Cluster Example
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44897%2fk-means-clustering-what-to-do-if-a-cluster-has-0-elements%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
This is mostly an issue with really bad initialization (random vector generation as well as random labeling are stupid, don't use it - choose k points wth sampling, or k-means++) and with data where k-means doesn't work well at all. So if this happens, you know the results won't be good!
Either way, the standard and straightforward solution is simple: use the previous mean if a cluster becomes empty. It could be assigned points later again. And if it doesn't, well, then the cluster is empty. No surprises here, no infinite loops, convergence issues, etc.
$endgroup$
add a comment |
$begingroup$
This is mostly an issue with really bad initialization (random vector generation as well as random labeling are stupid, don't use it - choose k points wth sampling, or k-means++) and with data where k-means doesn't work well at all. So if this happens, you know the results won't be good!
Either way, the standard and straightforward solution is simple: use the previous mean if a cluster becomes empty. It could be assigned points later again. And if it doesn't, well, then the cluster is empty. No surprises here, no infinite loops, convergence issues, etc.
$endgroup$
add a comment |
$begingroup$
This is mostly an issue with really bad initialization (random vector generation as well as random labeling are stupid, don't use it - choose k points wth sampling, or k-means++) and with data where k-means doesn't work well at all. So if this happens, you know the results won't be good!
Either way, the standard and straightforward solution is simple: use the previous mean if a cluster becomes empty. It could be assigned points later again. And if it doesn't, well, then the cluster is empty. No surprises here, no infinite loops, convergence issues, etc.
$endgroup$
This is mostly an issue with really bad initialization (random vector generation as well as random labeling are stupid, don't use it - choose k points wth sampling, or k-means++) and with data where k-means doesn't work well at all. So if this happens, you know the results won't be good!
Either way, the standard and straightforward solution is simple: use the previous mean if a cluster becomes empty. It could be assigned points later again. And if it doesn't, well, then the cluster is empty. No surprises here, no infinite loops, convergence issues, etc.
answered Feb 1 at 9:46
Anony-MousseAnony-Mousse
4,801624
4,801624
add a comment |
add a comment |
$begingroup$
If you get an empty cluster, it has no center of mass. You can simply ignore this cluster (set k=k-1
for next iteration), or repeat the k-means run from a new initialization. You can also choose to place a random data point into that cluster and carry on with the algorithm if you must have this specific number of K clusters.
If it keeps happening, there is a good chance that your K parameter is too high.
You should iterate over a few different values for k and pick the best.
Here is how an empty cluster might happen:
- Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose
k = 3
.
- Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.
- Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.
The images were taken from the following link: K-Means Empty Cluster Example
$endgroup$
add a comment |
$begingroup$
If you get an empty cluster, it has no center of mass. You can simply ignore this cluster (set k=k-1
for next iteration), or repeat the k-means run from a new initialization. You can also choose to place a random data point into that cluster and carry on with the algorithm if you must have this specific number of K clusters.
If it keeps happening, there is a good chance that your K parameter is too high.
You should iterate over a few different values for k and pick the best.
Here is how an empty cluster might happen:
- Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose
k = 3
.
- Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.
- Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.
The images were taken from the following link: K-Means Empty Cluster Example
$endgroup$
add a comment |
$begingroup$
If you get an empty cluster, it has no center of mass. You can simply ignore this cluster (set k=k-1
for next iteration), or repeat the k-means run from a new initialization. You can also choose to place a random data point into that cluster and carry on with the algorithm if you must have this specific number of K clusters.
If it keeps happening, there is a good chance that your K parameter is too high.
You should iterate over a few different values for k and pick the best.
Here is how an empty cluster might happen:
- Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose
k = 3
.
- Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.
- Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.
The images were taken from the following link: K-Means Empty Cluster Example
$endgroup$
If you get an empty cluster, it has no center of mass. You can simply ignore this cluster (set k=k-1
for next iteration), or repeat the k-means run from a new initialization. You can also choose to place a random data point into that cluster and carry on with the algorithm if you must have this specific number of K clusters.
If it keeps happening, there is a good chance that your K parameter is too high.
You should iterate over a few different values for k and pick the best.
Here is how an empty cluster might happen:
- Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose
k = 3
.
- Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.
- Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.
The images were taken from the following link: K-Means Empty Cluster Example
answered Feb 1 at 8:35
Mark.FMark.F
841318
841318
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44897%2fk-means-clustering-what-to-do-if-a-cluster-has-0-elements%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown