K-Means clustering - What to do if a cluster has 0 elements?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












1












$begingroup$


I'm writing code for k-means clustering. I have around 100000 vectors of size 128x1 (SIFT descriptors). I'm trying different initialization methods such as Forgy and Random Partition.
What if suppose, no vectors are classified to a cluster (in one of the iterations)? How to calculate centroid for that cluster?










share|improve this question









$endgroup$
















    1












    $begingroup$


    I'm writing code for k-means clustering. I have around 100000 vectors of size 128x1 (SIFT descriptors). I'm trying different initialization methods such as Forgy and Random Partition.
    What if suppose, no vectors are classified to a cluster (in one of the iterations)? How to calculate centroid for that cluster?










    share|improve this question









    $endgroup$














      1












      1








      1





      $begingroup$


      I'm writing code for k-means clustering. I have around 100000 vectors of size 128x1 (SIFT descriptors). I'm trying different initialization methods such as Forgy and Random Partition.
      What if suppose, no vectors are classified to a cluster (in one of the iterations)? How to calculate centroid for that cluster?










      share|improve this question









      $endgroup$




      I'm writing code for k-means clustering. I have around 100000 vectors of size 128x1 (SIFT descriptors). I'm trying different initialization methods such as Forgy and Random Partition.
      What if suppose, no vectors are classified to a cluster (in one of the iterations)? How to calculate centroid for that cluster?







      clustering k-means






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Feb 1 at 8:17









      Nagabhushan S NNagabhushan S N

      1326




      1326




















          2 Answers
          2






          active

          oldest

          votes


















          1












          $begingroup$

          This is mostly an issue with really bad initialization (random vector generation as well as random labeling are stupid, don't use it - choose k points wth sampling, or k-means++) and with data where k-means doesn't work well at all. So if this happens, you know the results won't be good!



          Either way, the standard and straightforward solution is simple: use the previous mean if a cluster becomes empty. It could be assigned points later again. And if it doesn't, well, then the cluster is empty. No surprises here, no infinite loops, convergence issues, etc.






          share|improve this answer









          $endgroup$




















            2












            $begingroup$

            If you get an empty cluster, it has no center of mass. You can simply ignore this cluster (set k=k-1 for next iteration), or repeat the k-means run from a new initialization. You can also choose to place a random data point into that cluster and carry on with the algorithm if you must have this specific number of K clusters.



            If it keeps happening, there is a good chance that your K parameter is too high.
            You should iterate over a few different values for k and pick the best.



            Here is how an empty cluster might happen:



            1. Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose k = 3.

            enter image description here



            1. Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.

            enter image description here



            1. Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.

            enter image description here



            The images were taken from the following link: K-Means Empty Cluster Example






            share|improve this answer









            $endgroup$












              Your Answer





              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "557"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44897%2fk-means-clustering-what-to-do-if-a-cluster-has-0-elements%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1












              $begingroup$

              This is mostly an issue with really bad initialization (random vector generation as well as random labeling are stupid, don't use it - choose k points wth sampling, or k-means++) and with data where k-means doesn't work well at all. So if this happens, you know the results won't be good!



              Either way, the standard and straightforward solution is simple: use the previous mean if a cluster becomes empty. It could be assigned points later again. And if it doesn't, well, then the cluster is empty. No surprises here, no infinite loops, convergence issues, etc.






              share|improve this answer









              $endgroup$

















                1












                $begingroup$

                This is mostly an issue with really bad initialization (random vector generation as well as random labeling are stupid, don't use it - choose k points wth sampling, or k-means++) and with data where k-means doesn't work well at all. So if this happens, you know the results won't be good!



                Either way, the standard and straightforward solution is simple: use the previous mean if a cluster becomes empty. It could be assigned points later again. And if it doesn't, well, then the cluster is empty. No surprises here, no infinite loops, convergence issues, etc.






                share|improve this answer









                $endgroup$















                  1












                  1








                  1





                  $begingroup$

                  This is mostly an issue with really bad initialization (random vector generation as well as random labeling are stupid, don't use it - choose k points wth sampling, or k-means++) and with data where k-means doesn't work well at all. So if this happens, you know the results won't be good!



                  Either way, the standard and straightforward solution is simple: use the previous mean if a cluster becomes empty. It could be assigned points later again. And if it doesn't, well, then the cluster is empty. No surprises here, no infinite loops, convergence issues, etc.






                  share|improve this answer









                  $endgroup$



                  This is mostly an issue with really bad initialization (random vector generation as well as random labeling are stupid, don't use it - choose k points wth sampling, or k-means++) and with data where k-means doesn't work well at all. So if this happens, you know the results won't be good!



                  Either way, the standard and straightforward solution is simple: use the previous mean if a cluster becomes empty. It could be assigned points later again. And if it doesn't, well, then the cluster is empty. No surprises here, no infinite loops, convergence issues, etc.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Feb 1 at 9:46









                  Anony-MousseAnony-Mousse

                  4,801624




                  4,801624





















                      2












                      $begingroup$

                      If you get an empty cluster, it has no center of mass. You can simply ignore this cluster (set k=k-1 for next iteration), or repeat the k-means run from a new initialization. You can also choose to place a random data point into that cluster and carry on with the algorithm if you must have this specific number of K clusters.



                      If it keeps happening, there is a good chance that your K parameter is too high.
                      You should iterate over a few different values for k and pick the best.



                      Here is how an empty cluster might happen:



                      1. Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose k = 3.

                      enter image description here



                      1. Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.

                      enter image description here



                      1. Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.

                      enter image description here



                      The images were taken from the following link: K-Means Empty Cluster Example






                      share|improve this answer









                      $endgroup$

















                        2












                        $begingroup$

                        If you get an empty cluster, it has no center of mass. You can simply ignore this cluster (set k=k-1 for next iteration), or repeat the k-means run from a new initialization. You can also choose to place a random data point into that cluster and carry on with the algorithm if you must have this specific number of K clusters.



                        If it keeps happening, there is a good chance that your K parameter is too high.
                        You should iterate over a few different values for k and pick the best.



                        Here is how an empty cluster might happen:



                        1. Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose k = 3.

                        enter image description here



                        1. Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.

                        enter image description here



                        1. Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.

                        enter image description here



                        The images were taken from the following link: K-Means Empty Cluster Example






                        share|improve this answer









                        $endgroup$















                          2












                          2








                          2





                          $begingroup$

                          If you get an empty cluster, it has no center of mass. You can simply ignore this cluster (set k=k-1 for next iteration), or repeat the k-means run from a new initialization. You can also choose to place a random data point into that cluster and carry on with the algorithm if you must have this specific number of K clusters.



                          If it keeps happening, there is a good chance that your K parameter is too high.
                          You should iterate over a few different values for k and pick the best.



                          Here is how an empty cluster might happen:



                          1. Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose k = 3.

                          enter image description here



                          1. Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.

                          enter image description here



                          1. Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.

                          enter image description here



                          The images were taken from the following link: K-Means Empty Cluster Example






                          share|improve this answer









                          $endgroup$



                          If you get an empty cluster, it has no center of mass. You can simply ignore this cluster (set k=k-1 for next iteration), or repeat the k-means run from a new initialization. You can also choose to place a random data point into that cluster and carry on with the algorithm if you must have this specific number of K clusters.



                          If it keeps happening, there is a good chance that your K parameter is too high.
                          You should iterate over a few different values for k and pick the best.



                          Here is how an empty cluster might happen:



                          1. Consider the following example with 7 data points. The different shapes of the points are just to show that there are actually 2 natural clusters (but we do not know that before running the algorithm). In this case we choose k = 3.

                          enter image description here



                          1. Three random cluster centers are initialized. At the end of first iteration points 3, 1, 2, and 7 will be in one cluster. 4 and 5 will be in another cluster. And 6 will be in the last cluster. Note here that the distance between 3 and 4 is larger than the distance between 4 and 5 and so 4 is assigned to the cluster represented by 5. Before we begin the second iteration we update the cluster centers and the following picture shows the centers and the clusters at the end of first step.

                          enter image description here



                          1. Now, the cluster center for the red cluster moved closer to point 4 due to 1, 2, and 7. And the cluster center for the blue cluster moved away from 5 due to point 4. In the next iteration point 4 will decide that it is closer to the red cluster and point 5 will decide that it is closer to the green cluster. This will cause blue cluster to be empty as shown below.

                          enter image description here



                          The images were taken from the following link: K-Means Empty Cluster Example







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Feb 1 at 8:35









                          Mark.FMark.F

                          841318




                          841318



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Data Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44897%2fk-means-clustering-what-to-do-if-a-cluster-has-0-elements%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown






                              Popular posts from this blog

                              How to check contact read email or not when send email to Individual?

                              Displaying single band from multi-band raster using QGIS

                              How many registers does an x86_64 CPU actually have?