K-Means clustering - What to do if a cluster has 0 elements?

I'm writing code for k-means clustering. I have around 100000 vectors of size 128x1 (SIFT descriptors). I'm trying different initialization methods such as Forgy and Random Partition.
What if suppose, no vectors are classified to a cluster (in one of the iterations)? How to calculate centroid for that cluster?

asked Feb 1 at 8:17

Nagabhushan S N

1326

add a comment |

asked Feb 1 at 8:17

Nagabhushan S N

1326

add a comment |

asked Feb 1 at 8:17

Nagabhushan S N

1326

clustering k-means

asked Feb 1 at 8:17

Nagabhushan S N

1326

asked Feb 1 at 8:17

Nagabhushan S N

1326

asked Feb 1 at 8:17

Nagabhushan S N

1326

asked Feb 1 at 8:17

Nagabhushan S N

1326

asked Feb 1 at 8:17

Nagabhushan S N

1326

add a comment |

2 Answers
2

active

oldest

votes

This is mostly an issue with really bad initialization (random vector generation as well as random labeling are stupid, don't use it - choose k points wth sampling, or k-means++) and with data where k-means doesn't work well at all. So if this happens, you know the results won't be good!

Either way, the standard and straightforward solution is simple: use the previous mean if a cluster becomes empty. It could be assigned points later again. And if it doesn't, well, then the cluster is empty. No surprises here, no infinite loops, convergence issues, etc.

answered Feb 1 at 9:46

Anony-Mousse

4,801624

add a comment |

If you get an empty cluster, it has no center of mass. You can simply ignore this cluster (set k=k-1 for next iteration), or repeat the k-means run from a new initialization. You can also choose to place a random data point into that cluster and carry on with the algorithm if you must have this specific number of K clusters.

If it keeps happening, there is a good chance that your K parameter is too high.
You should iterate over a few different values for k and pick the best.

Here is how an empty cluster might happen:

Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose k = 3.

enter image description here

Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.

enter image description here

Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.

enter image description here

The images were taken from the following link: K-Means Empty Cluster Example

answered Feb 1 at 8:35

Mark.F

841318

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44897%2fk-means-clustering-what-to-do-if-a-cluster-has-0-elements%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

answered Feb 1 at 9:46

Anony-Mousse

4,801624

add a comment |

answered Feb 1 at 9:46

Anony-Mousse

4,801624

add a comment |

answered Feb 1 at 9:46

Anony-Mousse

4,801624

answered Feb 1 at 9:46

Anony-Mousse

4,801624

answered Feb 1 at 9:46

Anony-Mousse

4,801624

answered Feb 1 at 9:46

Anony-Mousse

4,801624

answered Feb 1 at 9:46

Anony-Mousse

4,801624

add a comment |

If it keeps happening, there is a good chance that your K parameter is too high.
You should iterate over a few different values for k and pick the best.

Here is how an empty cluster might happen:

Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose k = 3.

enter image description here

Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.

enter image description here

Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.

enter image description here

The images were taken from the following link: K-Means Empty Cluster Example

answered Feb 1 at 8:35

Mark.F

841318

add a comment |

If it keeps happening, there is a good chance that your K parameter is too high.
You should iterate over a few different values for k and pick the best.

Here is how an empty cluster might happen:

Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose k = 3.

enter image description here

Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.

enter image description here

Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.

enter image description here

The images were taken from the following link: K-Means Empty Cluster Example

answered Feb 1 at 8:35

Mark.F

841318

add a comment |

If it keeps happening, there is a good chance that your K parameter is too high.
You should iterate over a few different values for k and pick the best.

Here is how an empty cluster might happen:

Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose k = 3.

enter image description here

Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.

enter image description here

Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.

enter image description here

The images were taken from the following link: K-Means Empty Cluster Example

answered Feb 1 at 8:35

Mark.F

841318

If it keeps happening, there is a good chance that your K parameter is too high.
You should iterate over a few different values for k and pick the best.

Here is how an empty cluster might happen:

Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose k = 3.

enter image description here

Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.

enter image description here

Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.

enter image description here

The images were taken from the following link: K-Means Empty Cluster Example

answered Feb 1 at 8:35

Mark.F

841318

answered Feb 1 at 8:35

Mark.F

841318

answered Feb 1 at 8:35

Mark.F

841318

answered Feb 1 at 8:35

Mark.F

841318

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu