What's so special about standard deviation?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












60












$begingroup$


Equivalently, about variance?



I realize it measures the spread of a distribution, but many other metrics could do the same (e.g., the average absolute deviation). What is its deeper significance? Does it have



  • a particular geometric interpretation (in the sense, e.g., that the mean is the balancing point of a distribution)?

  • any other intuitive interpretation that differentiates it from other possible measures of spread?

What's so special about it that makes it act as a normalizing factor in all sorts of situations (for example, convert covariance to correlation)?










share|cite|improve this question











$endgroup$







  • 12




    $begingroup$
    Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
    $endgroup$
    – Mark Viola
    Jan 12 at 20:44







  • 3




    $begingroup$
    Possible duplicate of Intuition behind Variance forumla
    $endgroup$
    – Michael Hoppe
    Jan 13 at 16:27






  • 4




    $begingroup$
    The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
    $endgroup$
    – Winther
    Jan 14 at 10:23







  • 2




    $begingroup$
    Your question could have been "is standard deviation a natural concept ?" : the best proof that it wasn't evident is that it has been "discovered" around 1900, which is very late for a "mathematical being", moreover with a simple definition.
    $endgroup$
    – Jean Marie
    Jan 14 at 13:08






  • 1




    $begingroup$
    I find myself wondering if part of the usefulness, at least in the past, is that standard deviation can be calculated by a one-pass algorithm (keep running totals of sum, sum of squares, count and then combine them at the end) while $E(|X - mu_X|)$ would require two passes as far as I know.
    $endgroup$
    – Daniel Schepler
    Jan 14 at 23:57















60












$begingroup$


Equivalently, about variance?



I realize it measures the spread of a distribution, but many other metrics could do the same (e.g., the average absolute deviation). What is its deeper significance? Does it have



  • a particular geometric interpretation (in the sense, e.g., that the mean is the balancing point of a distribution)?

  • any other intuitive interpretation that differentiates it from other possible measures of spread?

What's so special about it that makes it act as a normalizing factor in all sorts of situations (for example, convert covariance to correlation)?










share|cite|improve this question











$endgroup$







  • 12




    $begingroup$
    Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
    $endgroup$
    – Mark Viola
    Jan 12 at 20:44







  • 3




    $begingroup$
    Possible duplicate of Intuition behind Variance forumla
    $endgroup$
    – Michael Hoppe
    Jan 13 at 16:27






  • 4




    $begingroup$
    The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
    $endgroup$
    – Winther
    Jan 14 at 10:23







  • 2




    $begingroup$
    Your question could have been "is standard deviation a natural concept ?" : the best proof that it wasn't evident is that it has been "discovered" around 1900, which is very late for a "mathematical being", moreover with a simple definition.
    $endgroup$
    – Jean Marie
    Jan 14 at 13:08






  • 1




    $begingroup$
    I find myself wondering if part of the usefulness, at least in the past, is that standard deviation can be calculated by a one-pass algorithm (keep running totals of sum, sum of squares, count and then combine them at the end) while $E(|X - mu_X|)$ would require two passes as far as I know.
    $endgroup$
    – Daniel Schepler
    Jan 14 at 23:57













60












60








60


39



$begingroup$


Equivalently, about variance?



I realize it measures the spread of a distribution, but many other metrics could do the same (e.g., the average absolute deviation). What is its deeper significance? Does it have



  • a particular geometric interpretation (in the sense, e.g., that the mean is the balancing point of a distribution)?

  • any other intuitive interpretation that differentiates it from other possible measures of spread?

What's so special about it that makes it act as a normalizing factor in all sorts of situations (for example, convert covariance to correlation)?










share|cite|improve this question











$endgroup$




Equivalently, about variance?



I realize it measures the spread of a distribution, but many other metrics could do the same (e.g., the average absolute deviation). What is its deeper significance? Does it have



  • a particular geometric interpretation (in the sense, e.g., that the mean is the balancing point of a distribution)?

  • any other intuitive interpretation that differentiates it from other possible measures of spread?

What's so special about it that makes it act as a normalizing factor in all sorts of situations (for example, convert covariance to correlation)?







statistics






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jan 13 at 21:08









amWhy

1




1










asked Jan 12 at 20:39









blue_noteblue_note

49448




49448







  • 12




    $begingroup$
    Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
    $endgroup$
    – Mark Viola
    Jan 12 at 20:44







  • 3




    $begingroup$
    Possible duplicate of Intuition behind Variance forumla
    $endgroup$
    – Michael Hoppe
    Jan 13 at 16:27






  • 4




    $begingroup$
    The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
    $endgroup$
    – Winther
    Jan 14 at 10:23







  • 2




    $begingroup$
    Your question could have been "is standard deviation a natural concept ?" : the best proof that it wasn't evident is that it has been "discovered" around 1900, which is very late for a "mathematical being", moreover with a simple definition.
    $endgroup$
    – Jean Marie
    Jan 14 at 13:08






  • 1




    $begingroup$
    I find myself wondering if part of the usefulness, at least in the past, is that standard deviation can be calculated by a one-pass algorithm (keep running totals of sum, sum of squares, count and then combine them at the end) while $E(|X - mu_X|)$ would require two passes as far as I know.
    $endgroup$
    – Daniel Schepler
    Jan 14 at 23:57












  • 12




    $begingroup$
    Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
    $endgroup$
    – Mark Viola
    Jan 12 at 20:44







  • 3




    $begingroup$
    Possible duplicate of Intuition behind Variance forumla
    $endgroup$
    – Michael Hoppe
    Jan 13 at 16:27






  • 4




    $begingroup$
    The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
    $endgroup$
    – Winther
    Jan 14 at 10:23







  • 2




    $begingroup$
    Your question could have been "is standard deviation a natural concept ?" : the best proof that it wasn't evident is that it has been "discovered" around 1900, which is very late for a "mathematical being", moreover with a simple definition.
    $endgroup$
    – Jean Marie
    Jan 14 at 13:08






  • 1




    $begingroup$
    I find myself wondering if part of the usefulness, at least in the past, is that standard deviation can be calculated by a one-pass algorithm (keep running totals of sum, sum of squares, count and then combine them at the end) while $E(|X - mu_X|)$ would require two passes as far as I know.
    $endgroup$
    – Daniel Schepler
    Jan 14 at 23:57







12




12




$begingroup$
Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
$endgroup$
– Mark Viola
Jan 12 at 20:44





$begingroup$
Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
$endgroup$
– Mark Viola
Jan 12 at 20:44





3




3




$begingroup$
Possible duplicate of Intuition behind Variance forumla
$endgroup$
– Michael Hoppe
Jan 13 at 16:27




$begingroup$
Possible duplicate of Intuition behind Variance forumla
$endgroup$
– Michael Hoppe
Jan 13 at 16:27




4




4




$begingroup$
The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
$endgroup$
– Winther
Jan 14 at 10:23





$begingroup$
The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
$endgroup$
– Winther
Jan 14 at 10:23





2




2




$begingroup$
Your question could have been "is standard deviation a natural concept ?" : the best proof that it wasn't evident is that it has been "discovered" around 1900, which is very late for a "mathematical being", moreover with a simple definition.
$endgroup$
– Jean Marie
Jan 14 at 13:08




$begingroup$
Your question could have been "is standard deviation a natural concept ?" : the best proof that it wasn't evident is that it has been "discovered" around 1900, which is very late for a "mathematical being", moreover with a simple definition.
$endgroup$
– Jean Marie
Jan 14 at 13:08




1




1




$begingroup$
I find myself wondering if part of the usefulness, at least in the past, is that standard deviation can be calculated by a one-pass algorithm (keep running totals of sum, sum of squares, count and then combine them at the end) while $E(|X - mu_X|)$ would require two passes as far as I know.
$endgroup$
– Daniel Schepler
Jan 14 at 23:57




$begingroup$
I find myself wondering if part of the usefulness, at least in the past, is that standard deviation can be calculated by a one-pass algorithm (keep running totals of sum, sum of squares, count and then combine them at the end) while $E(|X - mu_X|)$ would require two passes as far as I know.
$endgroup$
– Daniel Schepler
Jan 14 at 23:57










9 Answers
9






active

oldest

votes


















70












$begingroup$

There's a very nice geometric interpretation.



Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "differs from by a constant", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean variables; it gets you the same outcome in this context.)



Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.






share|cite|improve this answer









$endgroup$








  • 10




    $begingroup$
    Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
    $endgroup$
    – blue_note
    Jan 12 at 21:03






  • 6




    $begingroup$
    @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
    $endgroup$
    – J.G.
    Jan 12 at 21:06







  • 4




    $begingroup$
    Can someone provide a concrete example or other similar dumbing down of this answer?
    $endgroup$
    – user1717828
    Jan 13 at 4:43







  • 2




    $begingroup$
    A paragraph on wikipedia about it @blue_note
    $endgroup$
    – WorldSEnder
    Jan 13 at 8:59







  • 10




    $begingroup$
    I need a 3Blue1Brown video on this.
    $endgroup$
    – RcnSc
    Jan 14 at 13:31


















14












$begingroup$

I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $mu$ and standard deviation $sigma$, then for large $n$



$$fracoverlineX - mufracsigmasqrtn$$



is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.






share|cite|improve this answer









$endgroup$












  • $begingroup$
    Related question to this: The role of variance in Central Limit Theorem
    $endgroup$
    – Winther
    Jan 14 at 10:45






  • 4




    $begingroup$
    No other measure of dispersion can so relate $X$ with the standard normal distribution, but that's only because the standard normal distribution is defined to have unit variance. If we defined it to have unit interquartile range instead, then for large $n$ we would say that $$fracoverlineX-muIQR/sqrt n$$ is approximately standard normal.
    $endgroup$
    – Misha Lavrov
    Jan 14 at 18:28










  • $begingroup$
    @MishaLavrov Good point (of the sort that I was alluding to in my parenthetical about reparameterization) but if you regard $sigma$ in the normal distribution to be a good measure of dispersion then the Central Limit Theorem gives you a reason to use it as a measure of dispersion in other distributions. I don't think that appeal to CLT is decisive, but it should be part of the discussion about the importance of the standard deviation.
    $endgroup$
    – John Coleman
    Jan 14 at 19:13



















4












$begingroup$

An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.



(This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)






share|cite|improve this answer









$endgroup$








  • 1




    $begingroup$
    Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
    $endgroup$
    – blue_note
    Jan 13 at 11:34


















2












$begingroup$

When defining "standard deviation", we want some way to take a bunch of deviations from a mean and quantify how big they typically are using a single number in the same units as the deviations themselves. But any definition of "standard deviation" induces a corresponding definition of "mean" because we want our choice of "mean" to always minimize the value of our "standard deviation" (intuitively, we want to define "mean" to be the "middlemost" point as measured by "standard deviation"). Only by defining "standard deviation" in the usual way do we recover the arithmetic mean while still having a measure in the right units. (Without getting into details, the key point is that the quadratic becomes linear when we take the derivative to find its critical point.)



If we want to use some other mean, we can of course find a different "standard deviation" that will match that mean (the progress is somewhat analogous to integration), but in practice it's just easier to transform the data so that the arithmetic mean is appropriate.






share|cite|improve this answer









$endgroup$












  • $begingroup$
    If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
    $endgroup$
    – mephistolotl
    Jan 13 at 2:36


















2












$begingroup$

Probably the most useful property of the variance is that it it additive: the variance of the sum of two independent random variables is the sum of the variances.



This does not occur with other estimators of the spread.






share|cite|improve this answer









$endgroup$




















    1












    $begingroup$

    The normal distribution has maximum entropy among real distributions supported on $(-infty, infty)$ with specified standard deviation (equivalently, variance). (Reference.) Consequently, if the only thing you know about a real distribution supported on $mathbbR$ is its mean and variance, the distribution that presumes the least prior information is the normal distribution.



    I don't tend to think of the statement above as the important fact. It's more: normal distributions appear frequently and knowing the location parameter (mean) is reasonable. So what else do I have to know to make the least presumptive model be the normal distribution? The dispersion (variance).






    share|cite|improve this answer









    $endgroup$




















      1












      $begingroup$

      Consider Casella/Berger, Statistical Inference, Section 10.3.2:




      Theorem 10.3.2: Consider a point estimation problem for a real-valued parameter $theta$. In each of the following two
      situations, if $delta^pi in D$ then $delta^pi$ is the Bayes rule
      (also called the Bayes estimator).



      a. For squared error loss, $delta^pi (x) = E(theta | x)$.



      b. For absolute error loss, $delta^pi (x) = textmedian of
      pi(theta | x)$
      .




      My interpretation of this is that using standard deviation leads one in the direction of an estimator for the mean; whereas using average absolute deviation leads one in the direction of an estimator for the median.






      share|cite|improve this answer









      $endgroup$




















        1












        $begingroup$

        The following is from An Introduction to Probability Theory and Its Applications, Vol. 1 by W. Feller.




        From Section IX.4: Variance



        • Some readers may be helped by the following interpretation in mechanics. Suppose that a unit mass is distributed on the $x$-axis so that the mass $f(x_j)$ is concentrated in $x_j$. Then the mean $mu$ is the abscissa of the center of gravity, and the variance is the moment of inertia.


        • Clearly different mass distributions may have the same center of gravity and the same moment of inertia, but it is well known that some important mechanical properties can be described in terms of these two quantities.







        share|cite|improve this answer









        $endgroup$




















          1












          $begingroup$

          If you draw a random sample from a normal distribution with mean $mu$ and variance $sigma^2$ then the mean and variance of the sample are sufficient statistics. This means that these two statistics contain all the information in the sample. The distribution of any other statistic (function of the observed values in the sample) given the sample mean and variance is independent of the true population mean and variance.



          For the normal distribution the sample variance is the optimal estimator of the population variance. For example the population variance could be estimated by a function of the mean deviation or by some function of the order statistics (interquartile range or the range) but the distribution of that estimator would have a greater spread than the sample variance.



          These facts are important as, following the central limit theorem, the distribution of many observed phenomena is approximately normal.






          share|cite|improve this answer









          $endgroup$












            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "69"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3071367%2fwhats-so-special-about-standard-deviation%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            9 Answers
            9






            active

            oldest

            votes








            9 Answers
            9






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            70












            $begingroup$

            There's a very nice geometric interpretation.



            Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "differs from by a constant", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean variables; it gets you the same outcome in this context.)



            Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.






            share|cite|improve this answer









            $endgroup$








            • 10




              $begingroup$
              Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
              $endgroup$
              – blue_note
              Jan 12 at 21:03






            • 6




              $begingroup$
              @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
              $endgroup$
              – J.G.
              Jan 12 at 21:06







            • 4




              $begingroup$
              Can someone provide a concrete example or other similar dumbing down of this answer?
              $endgroup$
              – user1717828
              Jan 13 at 4:43







            • 2




              $begingroup$
              A paragraph on wikipedia about it @blue_note
              $endgroup$
              – WorldSEnder
              Jan 13 at 8:59







            • 10




              $begingroup$
              I need a 3Blue1Brown video on this.
              $endgroup$
              – RcnSc
              Jan 14 at 13:31















            70












            $begingroup$

            There's a very nice geometric interpretation.



            Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "differs from by a constant", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean variables; it gets you the same outcome in this context.)



            Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.






            share|cite|improve this answer









            $endgroup$








            • 10




              $begingroup$
              Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
              $endgroup$
              – blue_note
              Jan 12 at 21:03






            • 6




              $begingroup$
              @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
              $endgroup$
              – J.G.
              Jan 12 at 21:06







            • 4




              $begingroup$
              Can someone provide a concrete example or other similar dumbing down of this answer?
              $endgroup$
              – user1717828
              Jan 13 at 4:43







            • 2




              $begingroup$
              A paragraph on wikipedia about it @blue_note
              $endgroup$
              – WorldSEnder
              Jan 13 at 8:59







            • 10




              $begingroup$
              I need a 3Blue1Brown video on this.
              $endgroup$
              – RcnSc
              Jan 14 at 13:31













            70












            70








            70





            $begingroup$

            There's a very nice geometric interpretation.



            Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "differs from by a constant", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean variables; it gets you the same outcome in this context.)



            Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.






            share|cite|improve this answer









            $endgroup$



            There's a very nice geometric interpretation.



            Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "differs from by a constant", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean variables; it gets you the same outcome in this context.)



            Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Jan 12 at 20:48









            J.G.J.G.

            25.2k22539




            25.2k22539







            • 10




              $begingroup$
              Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
              $endgroup$
              – blue_note
              Jan 12 at 21:03






            • 6




              $begingroup$
              @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
              $endgroup$
              – J.G.
              Jan 12 at 21:06







            • 4




              $begingroup$
              Can someone provide a concrete example or other similar dumbing down of this answer?
              $endgroup$
              – user1717828
              Jan 13 at 4:43







            • 2




              $begingroup$
              A paragraph on wikipedia about it @blue_note
              $endgroup$
              – WorldSEnder
              Jan 13 at 8:59







            • 10




              $begingroup$
              I need a 3Blue1Brown video on this.
              $endgroup$
              – RcnSc
              Jan 14 at 13:31












            • 10




              $begingroup$
              Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
              $endgroup$
              – blue_note
              Jan 12 at 21:03






            • 6




              $begingroup$
              @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
              $endgroup$
              – J.G.
              Jan 12 at 21:06







            • 4




              $begingroup$
              Can someone provide a concrete example or other similar dumbing down of this answer?
              $endgroup$
              – user1717828
              Jan 13 at 4:43







            • 2




              $begingroup$
              A paragraph on wikipedia about it @blue_note
              $endgroup$
              – WorldSEnder
              Jan 13 at 8:59







            • 10




              $begingroup$
              I need a 3Blue1Brown video on this.
              $endgroup$
              – RcnSc
              Jan 14 at 13:31







            10




            10




            $begingroup$
            Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
            $endgroup$
            – blue_note
            Jan 12 at 21:03




            $begingroup$
            Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
            $endgroup$
            – blue_note
            Jan 12 at 21:03




            6




            6




            $begingroup$
            @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
            $endgroup$
            – J.G.
            Jan 12 at 21:06





            $begingroup$
            @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
            $endgroup$
            – J.G.
            Jan 12 at 21:06





            4




            4




            $begingroup$
            Can someone provide a concrete example or other similar dumbing down of this answer?
            $endgroup$
            – user1717828
            Jan 13 at 4:43





            $begingroup$
            Can someone provide a concrete example or other similar dumbing down of this answer?
            $endgroup$
            – user1717828
            Jan 13 at 4:43





            2




            2




            $begingroup$
            A paragraph on wikipedia about it @blue_note
            $endgroup$
            – WorldSEnder
            Jan 13 at 8:59





            $begingroup$
            A paragraph on wikipedia about it @blue_note
            $endgroup$
            – WorldSEnder
            Jan 13 at 8:59





            10




            10




            $begingroup$
            I need a 3Blue1Brown video on this.
            $endgroup$
            – RcnSc
            Jan 14 at 13:31




            $begingroup$
            I need a 3Blue1Brown video on this.
            $endgroup$
            – RcnSc
            Jan 14 at 13:31











            14












            $begingroup$

            I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $mu$ and standard deviation $sigma$, then for large $n$



            $$fracoverlineX - mufracsigmasqrtn$$



            is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.






            share|cite|improve this answer









            $endgroup$












            • $begingroup$
              Related question to this: The role of variance in Central Limit Theorem
              $endgroup$
              – Winther
              Jan 14 at 10:45






            • 4




              $begingroup$
              No other measure of dispersion can so relate $X$ with the standard normal distribution, but that's only because the standard normal distribution is defined to have unit variance. If we defined it to have unit interquartile range instead, then for large $n$ we would say that $$fracoverlineX-muIQR/sqrt n$$ is approximately standard normal.
              $endgroup$
              – Misha Lavrov
              Jan 14 at 18:28










            • $begingroup$
              @MishaLavrov Good point (of the sort that I was alluding to in my parenthetical about reparameterization) but if you regard $sigma$ in the normal distribution to be a good measure of dispersion then the Central Limit Theorem gives you a reason to use it as a measure of dispersion in other distributions. I don't think that appeal to CLT is decisive, but it should be part of the discussion about the importance of the standard deviation.
              $endgroup$
              – John Coleman
              Jan 14 at 19:13
















            14












            $begingroup$

            I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $mu$ and standard deviation $sigma$, then for large $n$



            $$fracoverlineX - mufracsigmasqrtn$$



            is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.






            share|cite|improve this answer









            $endgroup$












            • $begingroup$
              Related question to this: The role of variance in Central Limit Theorem
              $endgroup$
              – Winther
              Jan 14 at 10:45






            • 4




              $begingroup$
              No other measure of dispersion can so relate $X$ with the standard normal distribution, but that's only because the standard normal distribution is defined to have unit variance. If we defined it to have unit interquartile range instead, then for large $n$ we would say that $$fracoverlineX-muIQR/sqrt n$$ is approximately standard normal.
              $endgroup$
              – Misha Lavrov
              Jan 14 at 18:28










            • $begingroup$
              @MishaLavrov Good point (of the sort that I was alluding to in my parenthetical about reparameterization) but if you regard $sigma$ in the normal distribution to be a good measure of dispersion then the Central Limit Theorem gives you a reason to use it as a measure of dispersion in other distributions. I don't think that appeal to CLT is decisive, but it should be part of the discussion about the importance of the standard deviation.
              $endgroup$
              – John Coleman
              Jan 14 at 19:13














            14












            14








            14





            $begingroup$

            I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $mu$ and standard deviation $sigma$, then for large $n$



            $$fracoverlineX - mufracsigmasqrtn$$



            is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.






            share|cite|improve this answer









            $endgroup$



            I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $mu$ and standard deviation $sigma$, then for large $n$



            $$fracoverlineX - mufracsigmasqrtn$$



            is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Jan 13 at 12:53









            John ColemanJohn Coleman

            3,87811224




            3,87811224











            • $begingroup$
              Related question to this: The role of variance in Central Limit Theorem
              $endgroup$
              – Winther
              Jan 14 at 10:45






            • 4




              $begingroup$
              No other measure of dispersion can so relate $X$ with the standard normal distribution, but that's only because the standard normal distribution is defined to have unit variance. If we defined it to have unit interquartile range instead, then for large $n$ we would say that $$fracoverlineX-muIQR/sqrt n$$ is approximately standard normal.
              $endgroup$
              – Misha Lavrov
              Jan 14 at 18:28










            • $begingroup$
              @MishaLavrov Good point (of the sort that I was alluding to in my parenthetical about reparameterization) but if you regard $sigma$ in the normal distribution to be a good measure of dispersion then the Central Limit Theorem gives you a reason to use it as a measure of dispersion in other distributions. I don't think that appeal to CLT is decisive, but it should be part of the discussion about the importance of the standard deviation.
              $endgroup$
              – John Coleman
              Jan 14 at 19:13

















            • $begingroup$
              Related question to this: The role of variance in Central Limit Theorem
              $endgroup$
              – Winther
              Jan 14 at 10:45






            • 4




              $begingroup$
              No other measure of dispersion can so relate $X$ with the standard normal distribution, but that's only because the standard normal distribution is defined to have unit variance. If we defined it to have unit interquartile range instead, then for large $n$ we would say that $$fracoverlineX-muIQR/sqrt n$$ is approximately standard normal.
              $endgroup$
              – Misha Lavrov
              Jan 14 at 18:28










            • $begingroup$
              @MishaLavrov Good point (of the sort that I was alluding to in my parenthetical about reparameterization) but if you regard $sigma$ in the normal distribution to be a good measure of dispersion then the Central Limit Theorem gives you a reason to use it as a measure of dispersion in other distributions. I don't think that appeal to CLT is decisive, but it should be part of the discussion about the importance of the standard deviation.
              $endgroup$
              – John Coleman
              Jan 14 at 19:13
















            $begingroup$
            Related question to this: The role of variance in Central Limit Theorem
            $endgroup$
            – Winther
            Jan 14 at 10:45




            $begingroup$
            Related question to this: The role of variance in Central Limit Theorem
            $endgroup$
            – Winther
            Jan 14 at 10:45




            4




            4




            $begingroup$
            No other measure of dispersion can so relate $X$ with the standard normal distribution, but that's only because the standard normal distribution is defined to have unit variance. If we defined it to have unit interquartile range instead, then for large $n$ we would say that $$fracoverlineX-muIQR/sqrt n$$ is approximately standard normal.
            $endgroup$
            – Misha Lavrov
            Jan 14 at 18:28




            $begingroup$
            No other measure of dispersion can so relate $X$ with the standard normal distribution, but that's only because the standard normal distribution is defined to have unit variance. If we defined it to have unit interquartile range instead, then for large $n$ we would say that $$fracoverlineX-muIQR/sqrt n$$ is approximately standard normal.
            $endgroup$
            – Misha Lavrov
            Jan 14 at 18:28












            $begingroup$
            @MishaLavrov Good point (of the sort that I was alluding to in my parenthetical about reparameterization) but if you regard $sigma$ in the normal distribution to be a good measure of dispersion then the Central Limit Theorem gives you a reason to use it as a measure of dispersion in other distributions. I don't think that appeal to CLT is decisive, but it should be part of the discussion about the importance of the standard deviation.
            $endgroup$
            – John Coleman
            Jan 14 at 19:13





            $begingroup$
            @MishaLavrov Good point (of the sort that I was alluding to in my parenthetical about reparameterization) but if you regard $sigma$ in the normal distribution to be a good measure of dispersion then the Central Limit Theorem gives you a reason to use it as a measure of dispersion in other distributions. I don't think that appeal to CLT is decisive, but it should be part of the discussion about the importance of the standard deviation.
            $endgroup$
            – John Coleman
            Jan 14 at 19:13












            4












            $begingroup$

            An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.



            (This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)






            share|cite|improve this answer









            $endgroup$








            • 1




              $begingroup$
              Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
              $endgroup$
              – blue_note
              Jan 13 at 11:34















            4












            $begingroup$

            An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.



            (This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)






            share|cite|improve this answer









            $endgroup$








            • 1




              $begingroup$
              Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
              $endgroup$
              – blue_note
              Jan 13 at 11:34













            4












            4








            4





            $begingroup$

            An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.



            (This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)






            share|cite|improve this answer









            $endgroup$



            An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.



            (This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Jan 13 at 10:17









            Anton GolovAnton Golov

            283111




            283111







            • 1




              $begingroup$
              Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
              $endgroup$
              – blue_note
              Jan 13 at 11:34












            • 1




              $begingroup$
              Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
              $endgroup$
              – blue_note
              Jan 13 at 11:34







            1




            1




            $begingroup$
            Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
            $endgroup$
            – blue_note
            Jan 13 at 11:34




            $begingroup$
            Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
            $endgroup$
            – blue_note
            Jan 13 at 11:34











            2












            $begingroup$

            When defining "standard deviation", we want some way to take a bunch of deviations from a mean and quantify how big they typically are using a single number in the same units as the deviations themselves. But any definition of "standard deviation" induces a corresponding definition of "mean" because we want our choice of "mean" to always minimize the value of our "standard deviation" (intuitively, we want to define "mean" to be the "middlemost" point as measured by "standard deviation"). Only by defining "standard deviation" in the usual way do we recover the arithmetic mean while still having a measure in the right units. (Without getting into details, the key point is that the quadratic becomes linear when we take the derivative to find its critical point.)



            If we want to use some other mean, we can of course find a different "standard deviation" that will match that mean (the progress is somewhat analogous to integration), but in practice it's just easier to transform the data so that the arithmetic mean is appropriate.






            share|cite|improve this answer









            $endgroup$












            • $begingroup$
              If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
              $endgroup$
              – mephistolotl
              Jan 13 at 2:36















            2












            $begingroup$

            When defining "standard deviation", we want some way to take a bunch of deviations from a mean and quantify how big they typically are using a single number in the same units as the deviations themselves. But any definition of "standard deviation" induces a corresponding definition of "mean" because we want our choice of "mean" to always minimize the value of our "standard deviation" (intuitively, we want to define "mean" to be the "middlemost" point as measured by "standard deviation"). Only by defining "standard deviation" in the usual way do we recover the arithmetic mean while still having a measure in the right units. (Without getting into details, the key point is that the quadratic becomes linear when we take the derivative to find its critical point.)



            If we want to use some other mean, we can of course find a different "standard deviation" that will match that mean (the progress is somewhat analogous to integration), but in practice it's just easier to transform the data so that the arithmetic mean is appropriate.






            share|cite|improve this answer









            $endgroup$












            • $begingroup$
              If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
              $endgroup$
              – mephistolotl
              Jan 13 at 2:36













            2












            2








            2





            $begingroup$

            When defining "standard deviation", we want some way to take a bunch of deviations from a mean and quantify how big they typically are using a single number in the same units as the deviations themselves. But any definition of "standard deviation" induces a corresponding definition of "mean" because we want our choice of "mean" to always minimize the value of our "standard deviation" (intuitively, we want to define "mean" to be the "middlemost" point as measured by "standard deviation"). Only by defining "standard deviation" in the usual way do we recover the arithmetic mean while still having a measure in the right units. (Without getting into details, the key point is that the quadratic becomes linear when we take the derivative to find its critical point.)



            If we want to use some other mean, we can of course find a different "standard deviation" that will match that mean (the progress is somewhat analogous to integration), but in practice it's just easier to transform the data so that the arithmetic mean is appropriate.






            share|cite|improve this answer









            $endgroup$



            When defining "standard deviation", we want some way to take a bunch of deviations from a mean and quantify how big they typically are using a single number in the same units as the deviations themselves. But any definition of "standard deviation" induces a corresponding definition of "mean" because we want our choice of "mean" to always minimize the value of our "standard deviation" (intuitively, we want to define "mean" to be the "middlemost" point as measured by "standard deviation"). Only by defining "standard deviation" in the usual way do we recover the arithmetic mean while still having a measure in the right units. (Without getting into details, the key point is that the quadratic becomes linear when we take the derivative to find its critical point.)



            If we want to use some other mean, we can of course find a different "standard deviation" that will match that mean (the progress is somewhat analogous to integration), but in practice it's just easier to transform the data so that the arithmetic mean is appropriate.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Jan 13 at 2:22









            QwertyQwerty

            211




            211











            • $begingroup$
              If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
              $endgroup$
              – mephistolotl
              Jan 13 at 2:36
















            • $begingroup$
              If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
              $endgroup$
              – mephistolotl
              Jan 13 at 2:36















            $begingroup$
            If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
            $endgroup$
            – mephistolotl
            Jan 13 at 2:36




            $begingroup$
            If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
            $endgroup$
            – mephistolotl
            Jan 13 at 2:36











            2












            $begingroup$

            Probably the most useful property of the variance is that it it additive: the variance of the sum of two independent random variables is the sum of the variances.



            This does not occur with other estimators of the spread.






            share|cite|improve this answer









            $endgroup$

















              2












              $begingroup$

              Probably the most useful property of the variance is that it it additive: the variance of the sum of two independent random variables is the sum of the variances.



              This does not occur with other estimators of the spread.






              share|cite|improve this answer









              $endgroup$















                2












                2








                2





                $begingroup$

                Probably the most useful property of the variance is that it it additive: the variance of the sum of two independent random variables is the sum of the variances.



                This does not occur with other estimators of the spread.






                share|cite|improve this answer









                $endgroup$



                Probably the most useful property of the variance is that it it additive: the variance of the sum of two independent random variables is the sum of the variances.



                This does not occur with other estimators of the spread.







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered Jan 17 at 18:22









                Yves DaoustYves Daoust

                126k671223




                126k671223





















                    1












                    $begingroup$

                    The normal distribution has maximum entropy among real distributions supported on $(-infty, infty)$ with specified standard deviation (equivalently, variance). (Reference.) Consequently, if the only thing you know about a real distribution supported on $mathbbR$ is its mean and variance, the distribution that presumes the least prior information is the normal distribution.



                    I don't tend to think of the statement above as the important fact. It's more: normal distributions appear frequently and knowing the location parameter (mean) is reasonable. So what else do I have to know to make the least presumptive model be the normal distribution? The dispersion (variance).






                    share|cite|improve this answer









                    $endgroup$

















                      1












                      $begingroup$

                      The normal distribution has maximum entropy among real distributions supported on $(-infty, infty)$ with specified standard deviation (equivalently, variance). (Reference.) Consequently, if the only thing you know about a real distribution supported on $mathbbR$ is its mean and variance, the distribution that presumes the least prior information is the normal distribution.



                      I don't tend to think of the statement above as the important fact. It's more: normal distributions appear frequently and knowing the location parameter (mean) is reasonable. So what else do I have to know to make the least presumptive model be the normal distribution? The dispersion (variance).






                      share|cite|improve this answer









                      $endgroup$















                        1












                        1








                        1





                        $begingroup$

                        The normal distribution has maximum entropy among real distributions supported on $(-infty, infty)$ with specified standard deviation (equivalently, variance). (Reference.) Consequently, if the only thing you know about a real distribution supported on $mathbbR$ is its mean and variance, the distribution that presumes the least prior information is the normal distribution.



                        I don't tend to think of the statement above as the important fact. It's more: normal distributions appear frequently and knowing the location parameter (mean) is reasonable. So what else do I have to know to make the least presumptive model be the normal distribution? The dispersion (variance).






                        share|cite|improve this answer









                        $endgroup$



                        The normal distribution has maximum entropy among real distributions supported on $(-infty, infty)$ with specified standard deviation (equivalently, variance). (Reference.) Consequently, if the only thing you know about a real distribution supported on $mathbbR$ is its mean and variance, the distribution that presumes the least prior information is the normal distribution.



                        I don't tend to think of the statement above as the important fact. It's more: normal distributions appear frequently and knowing the location parameter (mean) is reasonable. So what else do I have to know to make the least presumptive model be the normal distribution? The dispersion (variance).







                        share|cite|improve this answer












                        share|cite|improve this answer



                        share|cite|improve this answer










                        answered Jan 14 at 5:36









                        Eric TowersEric Towers

                        32.5k22369




                        32.5k22369





















                            1












                            $begingroup$

                            Consider Casella/Berger, Statistical Inference, Section 10.3.2:




                            Theorem 10.3.2: Consider a point estimation problem for a real-valued parameter $theta$. In each of the following two
                            situations, if $delta^pi in D$ then $delta^pi$ is the Bayes rule
                            (also called the Bayes estimator).



                            a. For squared error loss, $delta^pi (x) = E(theta | x)$.



                            b. For absolute error loss, $delta^pi (x) = textmedian of
                            pi(theta | x)$
                            .




                            My interpretation of this is that using standard deviation leads one in the direction of an estimator for the mean; whereas using average absolute deviation leads one in the direction of an estimator for the median.






                            share|cite|improve this answer









                            $endgroup$

















                              1












                              $begingroup$

                              Consider Casella/Berger, Statistical Inference, Section 10.3.2:




                              Theorem 10.3.2: Consider a point estimation problem for a real-valued parameter $theta$. In each of the following two
                              situations, if $delta^pi in D$ then $delta^pi$ is the Bayes rule
                              (also called the Bayes estimator).



                              a. For squared error loss, $delta^pi (x) = E(theta | x)$.



                              b. For absolute error loss, $delta^pi (x) = textmedian of
                              pi(theta | x)$
                              .




                              My interpretation of this is that using standard deviation leads one in the direction of an estimator for the mean; whereas using average absolute deviation leads one in the direction of an estimator for the median.






                              share|cite|improve this answer









                              $endgroup$















                                1












                                1








                                1





                                $begingroup$

                                Consider Casella/Berger, Statistical Inference, Section 10.3.2:




                                Theorem 10.3.2: Consider a point estimation problem for a real-valued parameter $theta$. In each of the following two
                                situations, if $delta^pi in D$ then $delta^pi$ is the Bayes rule
                                (also called the Bayes estimator).



                                a. For squared error loss, $delta^pi (x) = E(theta | x)$.



                                b. For absolute error loss, $delta^pi (x) = textmedian of
                                pi(theta | x)$
                                .




                                My interpretation of this is that using standard deviation leads one in the direction of an estimator for the mean; whereas using average absolute deviation leads one in the direction of an estimator for the median.






                                share|cite|improve this answer









                                $endgroup$



                                Consider Casella/Berger, Statistical Inference, Section 10.3.2:




                                Theorem 10.3.2: Consider a point estimation problem for a real-valued parameter $theta$. In each of the following two
                                situations, if $delta^pi in D$ then $delta^pi$ is the Bayes rule
                                (also called the Bayes estimator).



                                a. For squared error loss, $delta^pi (x) = E(theta | x)$.



                                b. For absolute error loss, $delta^pi (x) = textmedian of
                                pi(theta | x)$
                                .




                                My interpretation of this is that using standard deviation leads one in the direction of an estimator for the mean; whereas using average absolute deviation leads one in the direction of an estimator for the median.







                                share|cite|improve this answer












                                share|cite|improve this answer



                                share|cite|improve this answer










                                answered Jan 14 at 16:26









                                Daniel R. CollinsDaniel R. Collins

                                5,8381534




                                5,8381534





















                                    1












                                    $begingroup$

                                    The following is from An Introduction to Probability Theory and Its Applications, Vol. 1 by W. Feller.




                                    From Section IX.4: Variance



                                    • Some readers may be helped by the following interpretation in mechanics. Suppose that a unit mass is distributed on the $x$-axis so that the mass $f(x_j)$ is concentrated in $x_j$. Then the mean $mu$ is the abscissa of the center of gravity, and the variance is the moment of inertia.


                                    • Clearly different mass distributions may have the same center of gravity and the same moment of inertia, but it is well known that some important mechanical properties can be described in terms of these two quantities.







                                    share|cite|improve this answer









                                    $endgroup$

















                                      1












                                      $begingroup$

                                      The following is from An Introduction to Probability Theory and Its Applications, Vol. 1 by W. Feller.




                                      From Section IX.4: Variance



                                      • Some readers may be helped by the following interpretation in mechanics. Suppose that a unit mass is distributed on the $x$-axis so that the mass $f(x_j)$ is concentrated in $x_j$. Then the mean $mu$ is the abscissa of the center of gravity, and the variance is the moment of inertia.


                                      • Clearly different mass distributions may have the same center of gravity and the same moment of inertia, but it is well known that some important mechanical properties can be described in terms of these two quantities.







                                      share|cite|improve this answer









                                      $endgroup$















                                        1












                                        1








                                        1





                                        $begingroup$

                                        The following is from An Introduction to Probability Theory and Its Applications, Vol. 1 by W. Feller.




                                        From Section IX.4: Variance



                                        • Some readers may be helped by the following interpretation in mechanics. Suppose that a unit mass is distributed on the $x$-axis so that the mass $f(x_j)$ is concentrated in $x_j$. Then the mean $mu$ is the abscissa of the center of gravity, and the variance is the moment of inertia.


                                        • Clearly different mass distributions may have the same center of gravity and the same moment of inertia, but it is well known that some important mechanical properties can be described in terms of these two quantities.







                                        share|cite|improve this answer









                                        $endgroup$



                                        The following is from An Introduction to Probability Theory and Its Applications, Vol. 1 by W. Feller.




                                        From Section IX.4: Variance



                                        • Some readers may be helped by the following interpretation in mechanics. Suppose that a unit mass is distributed on the $x$-axis so that the mass $f(x_j)$ is concentrated in $x_j$. Then the mean $mu$ is the abscissa of the center of gravity, and the variance is the moment of inertia.


                                        • Clearly different mass distributions may have the same center of gravity and the same moment of inertia, but it is well known that some important mechanical properties can be described in terms of these two quantities.








                                        share|cite|improve this answer












                                        share|cite|improve this answer



                                        share|cite|improve this answer










                                        answered Jan 14 at 17:49









                                        Markus ScheuerMarkus Scheuer

                                        61.2k456145




                                        61.2k456145





















                                            1












                                            $begingroup$

                                            If you draw a random sample from a normal distribution with mean $mu$ and variance $sigma^2$ then the mean and variance of the sample are sufficient statistics. This means that these two statistics contain all the information in the sample. The distribution of any other statistic (function of the observed values in the sample) given the sample mean and variance is independent of the true population mean and variance.



                                            For the normal distribution the sample variance is the optimal estimator of the population variance. For example the population variance could be estimated by a function of the mean deviation or by some function of the order statistics (interquartile range or the range) but the distribution of that estimator would have a greater spread than the sample variance.



                                            These facts are important as, following the central limit theorem, the distribution of many observed phenomena is approximately normal.






                                            share|cite|improve this answer









                                            $endgroup$

















                                              1












                                              $begingroup$

                                              If you draw a random sample from a normal distribution with mean $mu$ and variance $sigma^2$ then the mean and variance of the sample are sufficient statistics. This means that these two statistics contain all the information in the sample. The distribution of any other statistic (function of the observed values in the sample) given the sample mean and variance is independent of the true population mean and variance.



                                              For the normal distribution the sample variance is the optimal estimator of the population variance. For example the population variance could be estimated by a function of the mean deviation or by some function of the order statistics (interquartile range or the range) but the distribution of that estimator would have a greater spread than the sample variance.



                                              These facts are important as, following the central limit theorem, the distribution of many observed phenomena is approximately normal.






                                              share|cite|improve this answer









                                              $endgroup$















                                                1












                                                1








                                                1





                                                $begingroup$

                                                If you draw a random sample from a normal distribution with mean $mu$ and variance $sigma^2$ then the mean and variance of the sample are sufficient statistics. This means that these two statistics contain all the information in the sample. The distribution of any other statistic (function of the observed values in the sample) given the sample mean and variance is independent of the true population mean and variance.



                                                For the normal distribution the sample variance is the optimal estimator of the population variance. For example the population variance could be estimated by a function of the mean deviation or by some function of the order statistics (interquartile range or the range) but the distribution of that estimator would have a greater spread than the sample variance.



                                                These facts are important as, following the central limit theorem, the distribution of many observed phenomena is approximately normal.






                                                share|cite|improve this answer









                                                $endgroup$



                                                If you draw a random sample from a normal distribution with mean $mu$ and variance $sigma^2$ then the mean and variance of the sample are sufficient statistics. This means that these two statistics contain all the information in the sample. The distribution of any other statistic (function of the observed values in the sample) given the sample mean and variance is independent of the true population mean and variance.



                                                For the normal distribution the sample variance is the optimal estimator of the population variance. For example the population variance could be estimated by a function of the mean deviation or by some function of the order statistics (interquartile range or the range) but the distribution of that estimator would have a greater spread than the sample variance.



                                                These facts are important as, following the central limit theorem, the distribution of many observed phenomena is approximately normal.







                                                share|cite|improve this answer












                                                share|cite|improve this answer



                                                share|cite|improve this answer










                                                answered Jan 14 at 21:18









                                                user1483user1483

                                                753




                                                753



























                                                    draft saved

                                                    draft discarded
















































                                                    Thanks for contributing an answer to Mathematics Stack Exchange!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid


                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.

                                                    Use MathJax to format equations. MathJax reference.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function ()
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3071367%2fwhats-so-special-about-standard-deviation%23new-answer', 'question_page');

                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown






                                                    Popular posts from this blog

                                                    How to check contact read email or not when send email to Individual?

                                                    Displaying single band from multi-band raster using QGIS

                                                    How many registers does an x86_64 CPU actually have?