Why is PCA sensitive to outliers?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
21
down vote

favorite
8












There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?










share|cite|improve this question



















  • 4




    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    – mathreadler
    Nov 26 at 11:48










  • This answer tells you everything you need. Just picture an outlier and read attentively.
    – Stephan Kolassa
    Nov 26 at 20:15
















up vote
21
down vote

favorite
8












There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?










share|cite|improve this question



















  • 4




    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    – mathreadler
    Nov 26 at 11:48










  • This answer tells you everything you need. Just picture an outlier and read attentively.
    – Stephan Kolassa
    Nov 26 at 20:15












up vote
21
down vote

favorite
8









up vote
21
down vote

favorite
8






8





There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?










share|cite|improve this question















There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?







machine-learning pca outliers






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Nov 26 at 2:07

























asked Nov 26 at 1:59









Psi

20817




20817







  • 4




    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    – mathreadler
    Nov 26 at 11:48










  • This answer tells you everything you need. Just picture an outlier and read attentively.
    – Stephan Kolassa
    Nov 26 at 20:15












  • 4




    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    – mathreadler
    Nov 26 at 11:48










  • This answer tells you everything you need. Just picture an outlier and read attentively.
    – Stephan Kolassa
    Nov 26 at 20:15







4




4




Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
– mathreadler
Nov 26 at 11:48




Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
– mathreadler
Nov 26 at 11:48












This answer tells you everything you need. Just picture an outlier and read attentively.
– Stephan Kolassa
Nov 26 at 20:15




This answer tells you everything you need. Just picture an outlier and read attentively.
– Stephan Kolassa
Nov 26 at 20:15










1 Answer
1






active

oldest

votes

















up vote
30
down vote



accepted










One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
$$lVert Y-XA rVert^2_F = sum_j=1^m lVert Y_j - X A_j. rVert^2 $$
Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
cdot rVert_F$
is a Frobenius norm of the matrix



Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






share|cite|improve this answer






















    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "65"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378751%2fwhy-is-pca-sensitive-to-outliers%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    30
    down vote



    accepted










    One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
    $$lVert Y-XA rVert^2_F = sum_j=1^m lVert Y_j - X A_j. rVert^2 $$
    Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
    cdot rVert_F$
    is a Frobenius norm of the matrix



    Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






    share|cite|improve this answer


























      up vote
      30
      down vote



      accepted










      One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
      $$lVert Y-XA rVert^2_F = sum_j=1^m lVert Y_j - X A_j. rVert^2 $$
      Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
      cdot rVert_F$
      is a Frobenius norm of the matrix



      Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






      share|cite|improve this answer
























        up vote
        30
        down vote



        accepted







        up vote
        30
        down vote



        accepted






        One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
        $$lVert Y-XA rVert^2_F = sum_j=1^m lVert Y_j - X A_j. rVert^2 $$
        Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
        cdot rVert_F$
        is a Frobenius norm of the matrix



        Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






        share|cite|improve this answer














        One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
        $$lVert Y-XA rVert^2_F = sum_j=1^m lVert Y_j - X A_j. rVert^2 $$
        Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
        cdot rVert_F$
        is a Frobenius norm of the matrix



        Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Nov 26 at 18:51

























        answered Nov 26 at 4:40









        sega_sai

        750610




        750610



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378751%2fwhy-is-pca-sensitive-to-outliers%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown






            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?