Can I use moments of a distribution to sample the distribution?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite












I notice in statistics/machine learning methods, a distribution is often approximated by a Gaussian by computing the first two moments, using those to estimate $mu$ and $sigma^2$, and subsequently sample from that Gaussian.



It seems to me the more moments I calculate, the better I ought to be able to approximate the distribution I wish to sample.



What if I calculate 3 moments...how can I use those to sample from the distribution? And can this be extended to N moments?










share|cite|improve this question









New contributor




curious_dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Three moments don't determine a distributional form*; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions. $quad$ [* Indeed, sometimes even having all the moments isn't sufficient to determine a distribution.]
    – Glen_b♦
    3 hours ago











  • Thanks, @Glen_b ! I will read about the "method of moments" to understand when it's possible. Can you point me to theory that describes when the moments are not sufficient to determine the distribution?
    – curious_dan
    3 hours ago










  • "Method of moments" just tells you how to estimate parameters from moments. The remainder of your comment is a new question (I think it's already answered on site); briefly -- if the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (technically, you could in principle do an inverse Laplace transform). Certainly if some moments are not finite this would mean the mgf doesn't exist, but there are also cases where all moments are finite but the mgf still doesn't exist in a neighborhood of 0..
    – Glen_b♦
    3 hours ago











  • Thank you for these leads, @Glen_b!
    – curious_dan
    2 hours ago










  • I'm writing an answer based on my comment.
    – Glen_b♦
    2 hours ago
















up vote
3
down vote

favorite












I notice in statistics/machine learning methods, a distribution is often approximated by a Gaussian by computing the first two moments, using those to estimate $mu$ and $sigma^2$, and subsequently sample from that Gaussian.



It seems to me the more moments I calculate, the better I ought to be able to approximate the distribution I wish to sample.



What if I calculate 3 moments...how can I use those to sample from the distribution? And can this be extended to N moments?










share|cite|improve this question









New contributor




curious_dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Three moments don't determine a distributional form*; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions. $quad$ [* Indeed, sometimes even having all the moments isn't sufficient to determine a distribution.]
    – Glen_b♦
    3 hours ago











  • Thanks, @Glen_b ! I will read about the "method of moments" to understand when it's possible. Can you point me to theory that describes when the moments are not sufficient to determine the distribution?
    – curious_dan
    3 hours ago










  • "Method of moments" just tells you how to estimate parameters from moments. The remainder of your comment is a new question (I think it's already answered on site); briefly -- if the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (technically, you could in principle do an inverse Laplace transform). Certainly if some moments are not finite this would mean the mgf doesn't exist, but there are also cases where all moments are finite but the mgf still doesn't exist in a neighborhood of 0..
    – Glen_b♦
    3 hours ago











  • Thank you for these leads, @Glen_b!
    – curious_dan
    2 hours ago










  • I'm writing an answer based on my comment.
    – Glen_b♦
    2 hours ago












up vote
3
down vote

favorite









up vote
3
down vote

favorite











I notice in statistics/machine learning methods, a distribution is often approximated by a Gaussian by computing the first two moments, using those to estimate $mu$ and $sigma^2$, and subsequently sample from that Gaussian.



It seems to me the more moments I calculate, the better I ought to be able to approximate the distribution I wish to sample.



What if I calculate 3 moments...how can I use those to sample from the distribution? And can this be extended to N moments?










share|cite|improve this question









New contributor




curious_dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I notice in statistics/machine learning methods, a distribution is often approximated by a Gaussian by computing the first two moments, using those to estimate $mu$ and $sigma^2$, and subsequently sample from that Gaussian.



It seems to me the more moments I calculate, the better I ought to be able to approximate the distribution I wish to sample.



What if I calculate 3 moments...how can I use those to sample from the distribution? And can this be extended to N moments?







probability sampling moments






share|cite|improve this question









New contributor




curious_dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question









New contributor




curious_dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question








edited 2 hours ago









Ferdi

3,66542152




3,66542152






New contributor




curious_dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 3 hours ago









curious_dan

163




163




New contributor




curious_dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





curious_dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






curious_dan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • Three moments don't determine a distributional form*; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions. $quad$ [* Indeed, sometimes even having all the moments isn't sufficient to determine a distribution.]
    – Glen_b♦
    3 hours ago











  • Thanks, @Glen_b ! I will read about the "method of moments" to understand when it's possible. Can you point me to theory that describes when the moments are not sufficient to determine the distribution?
    – curious_dan
    3 hours ago










  • "Method of moments" just tells you how to estimate parameters from moments. The remainder of your comment is a new question (I think it's already answered on site); briefly -- if the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (technically, you could in principle do an inverse Laplace transform). Certainly if some moments are not finite this would mean the mgf doesn't exist, but there are also cases where all moments are finite but the mgf still doesn't exist in a neighborhood of 0..
    – Glen_b♦
    3 hours ago











  • Thank you for these leads, @Glen_b!
    – curious_dan
    2 hours ago










  • I'm writing an answer based on my comment.
    – Glen_b♦
    2 hours ago
















  • Three moments don't determine a distributional form*; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions. $quad$ [* Indeed, sometimes even having all the moments isn't sufficient to determine a distribution.]
    – Glen_b♦
    3 hours ago











  • Thanks, @Glen_b ! I will read about the "method of moments" to understand when it's possible. Can you point me to theory that describes when the moments are not sufficient to determine the distribution?
    – curious_dan
    3 hours ago










  • "Method of moments" just tells you how to estimate parameters from moments. The remainder of your comment is a new question (I think it's already answered on site); briefly -- if the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (technically, you could in principle do an inverse Laplace transform). Certainly if some moments are not finite this would mean the mgf doesn't exist, but there are also cases where all moments are finite but the mgf still doesn't exist in a neighborhood of 0..
    – Glen_b♦
    3 hours ago











  • Thank you for these leads, @Glen_b!
    – curious_dan
    2 hours ago










  • I'm writing an answer based on my comment.
    – Glen_b♦
    2 hours ago















Three moments don't determine a distributional form*; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions. $quad$ [* Indeed, sometimes even having all the moments isn't sufficient to determine a distribution.]
– Glen_b♦
3 hours ago





Three moments don't determine a distributional form*; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions. $quad$ [* Indeed, sometimes even having all the moments isn't sufficient to determine a distribution.]
– Glen_b♦
3 hours ago













Thanks, @Glen_b ! I will read about the "method of moments" to understand when it's possible. Can you point me to theory that describes when the moments are not sufficient to determine the distribution?
– curious_dan
3 hours ago




Thanks, @Glen_b ! I will read about the "method of moments" to understand when it's possible. Can you point me to theory that describes when the moments are not sufficient to determine the distribution?
– curious_dan
3 hours ago












"Method of moments" just tells you how to estimate parameters from moments. The remainder of your comment is a new question (I think it's already answered on site); briefly -- if the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (technically, you could in principle do an inverse Laplace transform). Certainly if some moments are not finite this would mean the mgf doesn't exist, but there are also cases where all moments are finite but the mgf still doesn't exist in a neighborhood of 0..
– Glen_b♦
3 hours ago





"Method of moments" just tells you how to estimate parameters from moments. The remainder of your comment is a new question (I think it's already answered on site); briefly -- if the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (technically, you could in principle do an inverse Laplace transform). Certainly if some moments are not finite this would mean the mgf doesn't exist, but there are also cases where all moments are finite but the mgf still doesn't exist in a neighborhood of 0..
– Glen_b♦
3 hours ago













Thank you for these leads, @Glen_b!
– curious_dan
2 hours ago




Thank you for these leads, @Glen_b!
– curious_dan
2 hours ago












I'm writing an answer based on my comment.
– Glen_b♦
2 hours ago




I'm writing an answer based on my comment.
– Glen_b♦
2 hours ago










2 Answers
2






active

oldest

votes

















up vote
3
down vote













Three moments don't determine a distributional form; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions.



Sometimes even having all the moments isn't sufficient to determine a distribution. If the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (you could in principle do an inverse Laplace transform to obtain it).



[If some moments are not finite this would mean the mgf doesn't exist, but there are also cases where all moments are finite but the mgf still doesn't exist in a neighborhood of 0.]




How the process would work for a specific choice of distribution



We can simplify the process of obtaining a distribution matching three moments by ignoring the mean and variance and working with a scaled third moment -- the moment-skewness ($gamma_1=mu_3/mu_2^3/2$).



We can do this because having selected a distribution with the relevant skewness, we can then back out the desired mean and variance by scaling and shifting.



Let's consider an example. Yesterday I created a large data set (which still happens to be in my R session) whose distribution I haven't tried to calculate the functional form of (it's a large set of values of the log of the sample variance of a Cauchy at n=10). We have the first three raw moments as
1.519, 3.597 and 11.479 respectively, or correspondingly a mean of 1.518, a standard deviation* of 1.136 and a skewness of 1.429 (so these are sample values from a large sample).



Formally, method of moments would attempt to match the raw moments, but the calculation is simpler if we start with the skewness (turning solving three equations in three unknowns into solving for one parameter at a time, a much simpler task).



* I am going to handwave away the distinction between using an n-denominator on the variance - as would correspond to formal method of moments - and an n-1 denominator and simply use sample calculations.



This skewness (~1.43) indicates we seek a distribution which is right-skew. I could choose, for example, a shifted lognormal distribution (three parameter lognormal, shape $sigma$, scale $mu$ and location-shift $gamma$) with the same moments. Let's begin by matching the skewness. The population skewness of a two parameter lognormal is:



$gamma_1=(e^sigma ^2!!+2)sqrt e^sigma ^2!!-1$



So let's start by equating that to the desired sample value to obtain an estimate of $sigma^2$, $tildesigma^2$, say.



Note that $gamma_1^2$ is $(tau+2)^2(tau-1)$ where $tau=e^sigma^2$. This then yields a simple cubic equation $tau^3+3tau^2-4=gamma_1^2$. Using the sample skewness in that equation yields $tildetauapprox 1.1995$ or $tildesigma^2approx 0.1819$.



We can then in turn solve for $mu$ by matching the variance (or standard deviation) and then for the location parameter by matching the mean.



But we could as easily have chosen a shifted-gamma or a shifted-Weibull distribution (or a shifted-F or any number of other choices) and run through essentially the same process. Each of them would be different.



(For the sample I was dealing with, a shifted gamma would probably have been a considerably better choice than a shifted lognormal, since the distribution of the logs of the values was left skew and the distribution of their cube root was very close to symmetric; these are consistent with what you see with unshifted gamma densities.)



One could even take the skewness-kurtosis diagram in a Pearson plot and draw a line at the desired skewness and thereby obtain a two-point distribution, sequence of beta distributions, a gamma distribution, a sequence of beta-prime distributions, an inverse-gamma disribution and a sequence of Pearson type IV distributions all with the same skewness.



We can see this illustrated in a skewness-kurtosis plot (Pearson plot) below (note that $beta_1=gamma_1^2$ and $beta_2$ is the kurtosis), with the regions for the various Pearson-distributions marked in.



Pearson plot with desired skewness line drawn in



The green horizontal line represents $gamma_1^2 = 2.042$, and we see it pass through each of the mentioned distribution-families, each corresponding to a different population kurtosis. (The dashed curve represents the lognormal, which is not a Pearson-family distribution)






share|cite|improve this answer





























    up vote
    1
    down vote













    So, the answer is generally NO, you can't do this, but sometimes you can.



    When you can't



    The reasons you can't do this usually are two folds.



    First, if you have N observations, then at most you can calculates N moments. What about the other moments? You can't simply set them to zero.



    Second, higher moments calculations become less and less precise, because you have to raise the numbers into higher powers. Consider 100th non-central moment, you can't usually calculate it with any precision: $$gamma_100=sum_ifracx_i^100 n$$



    When you can



    Now, sometimes you can get the distribution from moments. It's when you make an assumption about the distribution of some sort. For instance, you declare that it's normal. In this case all you need is just two moment, which can be calculated with decent precision, usually. Note, that normal distribution has higher moments, indeed, e.g. kurtosis, but we don't need them. If you were to calculate all moments of the normal distribution (without assuming it's normal), then tried to recover the characteristic function to sample from the distribution, it wouldn't work. However, when you forget about the higher moments and stick to the first two, it does work.






    share|cite|improve this answer




















      Your Answer





      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "65"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );






      curious_dan is a new contributor. Be nice, and check out our Code of Conduct.









       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f375515%2fcan-i-use-moments-of-a-distribution-to-sample-the-distribution%23new-answer', 'question_page');

      );

      Post as a guest






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      3
      down vote













      Three moments don't determine a distributional form; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions.



      Sometimes even having all the moments isn't sufficient to determine a distribution. If the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (you could in principle do an inverse Laplace transform to obtain it).



      [If some moments are not finite this would mean the mgf doesn't exist, but there are also cases where all moments are finite but the mgf still doesn't exist in a neighborhood of 0.]




      How the process would work for a specific choice of distribution



      We can simplify the process of obtaining a distribution matching three moments by ignoring the mean and variance and working with a scaled third moment -- the moment-skewness ($gamma_1=mu_3/mu_2^3/2$).



      We can do this because having selected a distribution with the relevant skewness, we can then back out the desired mean and variance by scaling and shifting.



      Let's consider an example. Yesterday I created a large data set (which still happens to be in my R session) whose distribution I haven't tried to calculate the functional form of (it's a large set of values of the log of the sample variance of a Cauchy at n=10). We have the first three raw moments as
      1.519, 3.597 and 11.479 respectively, or correspondingly a mean of 1.518, a standard deviation* of 1.136 and a skewness of 1.429 (so these are sample values from a large sample).



      Formally, method of moments would attempt to match the raw moments, but the calculation is simpler if we start with the skewness (turning solving three equations in three unknowns into solving for one parameter at a time, a much simpler task).



      * I am going to handwave away the distinction between using an n-denominator on the variance - as would correspond to formal method of moments - and an n-1 denominator and simply use sample calculations.



      This skewness (~1.43) indicates we seek a distribution which is right-skew. I could choose, for example, a shifted lognormal distribution (three parameter lognormal, shape $sigma$, scale $mu$ and location-shift $gamma$) with the same moments. Let's begin by matching the skewness. The population skewness of a two parameter lognormal is:



      $gamma_1=(e^sigma ^2!!+2)sqrt e^sigma ^2!!-1$



      So let's start by equating that to the desired sample value to obtain an estimate of $sigma^2$, $tildesigma^2$, say.



      Note that $gamma_1^2$ is $(tau+2)^2(tau-1)$ where $tau=e^sigma^2$. This then yields a simple cubic equation $tau^3+3tau^2-4=gamma_1^2$. Using the sample skewness in that equation yields $tildetauapprox 1.1995$ or $tildesigma^2approx 0.1819$.



      We can then in turn solve for $mu$ by matching the variance (or standard deviation) and then for the location parameter by matching the mean.



      But we could as easily have chosen a shifted-gamma or a shifted-Weibull distribution (or a shifted-F or any number of other choices) and run through essentially the same process. Each of them would be different.



      (For the sample I was dealing with, a shifted gamma would probably have been a considerably better choice than a shifted lognormal, since the distribution of the logs of the values was left skew and the distribution of their cube root was very close to symmetric; these are consistent with what you see with unshifted gamma densities.)



      One could even take the skewness-kurtosis diagram in a Pearson plot and draw a line at the desired skewness and thereby obtain a two-point distribution, sequence of beta distributions, a gamma distribution, a sequence of beta-prime distributions, an inverse-gamma disribution and a sequence of Pearson type IV distributions all with the same skewness.



      We can see this illustrated in a skewness-kurtosis plot (Pearson plot) below (note that $beta_1=gamma_1^2$ and $beta_2$ is the kurtosis), with the regions for the various Pearson-distributions marked in.



      Pearson plot with desired skewness line drawn in



      The green horizontal line represents $gamma_1^2 = 2.042$, and we see it pass through each of the mentioned distribution-families, each corresponding to a different population kurtosis. (The dashed curve represents the lognormal, which is not a Pearson-family distribution)






      share|cite|improve this answer


























        up vote
        3
        down vote













        Three moments don't determine a distributional form; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions.



        Sometimes even having all the moments isn't sufficient to determine a distribution. If the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (you could in principle do an inverse Laplace transform to obtain it).



        [If some moments are not finite this would mean the mgf doesn't exist, but there are also cases where all moments are finite but the mgf still doesn't exist in a neighborhood of 0.]




        How the process would work for a specific choice of distribution



        We can simplify the process of obtaining a distribution matching three moments by ignoring the mean and variance and working with a scaled third moment -- the moment-skewness ($gamma_1=mu_3/mu_2^3/2$).



        We can do this because having selected a distribution with the relevant skewness, we can then back out the desired mean and variance by scaling and shifting.



        Let's consider an example. Yesterday I created a large data set (which still happens to be in my R session) whose distribution I haven't tried to calculate the functional form of (it's a large set of values of the log of the sample variance of a Cauchy at n=10). We have the first three raw moments as
        1.519, 3.597 and 11.479 respectively, or correspondingly a mean of 1.518, a standard deviation* of 1.136 and a skewness of 1.429 (so these are sample values from a large sample).



        Formally, method of moments would attempt to match the raw moments, but the calculation is simpler if we start with the skewness (turning solving three equations in three unknowns into solving for one parameter at a time, a much simpler task).



        * I am going to handwave away the distinction between using an n-denominator on the variance - as would correspond to formal method of moments - and an n-1 denominator and simply use sample calculations.



        This skewness (~1.43) indicates we seek a distribution which is right-skew. I could choose, for example, a shifted lognormal distribution (three parameter lognormal, shape $sigma$, scale $mu$ and location-shift $gamma$) with the same moments. Let's begin by matching the skewness. The population skewness of a two parameter lognormal is:



        $gamma_1=(e^sigma ^2!!+2)sqrt e^sigma ^2!!-1$



        So let's start by equating that to the desired sample value to obtain an estimate of $sigma^2$, $tildesigma^2$, say.



        Note that $gamma_1^2$ is $(tau+2)^2(tau-1)$ where $tau=e^sigma^2$. This then yields a simple cubic equation $tau^3+3tau^2-4=gamma_1^2$. Using the sample skewness in that equation yields $tildetauapprox 1.1995$ or $tildesigma^2approx 0.1819$.



        We can then in turn solve for $mu$ by matching the variance (or standard deviation) and then for the location parameter by matching the mean.



        But we could as easily have chosen a shifted-gamma or a shifted-Weibull distribution (or a shifted-F or any number of other choices) and run through essentially the same process. Each of them would be different.



        (For the sample I was dealing with, a shifted gamma would probably have been a considerably better choice than a shifted lognormal, since the distribution of the logs of the values was left skew and the distribution of their cube root was very close to symmetric; these are consistent with what you see with unshifted gamma densities.)



        One could even take the skewness-kurtosis diagram in a Pearson plot and draw a line at the desired skewness and thereby obtain a two-point distribution, sequence of beta distributions, a gamma distribution, a sequence of beta-prime distributions, an inverse-gamma disribution and a sequence of Pearson type IV distributions all with the same skewness.



        We can see this illustrated in a skewness-kurtosis plot (Pearson plot) below (note that $beta_1=gamma_1^2$ and $beta_2$ is the kurtosis), with the regions for the various Pearson-distributions marked in.



        Pearson plot with desired skewness line drawn in



        The green horizontal line represents $gamma_1^2 = 2.042$, and we see it pass through each of the mentioned distribution-families, each corresponding to a different population kurtosis. (The dashed curve represents the lognormal, which is not a Pearson-family distribution)






        share|cite|improve this answer
























          up vote
          3
          down vote










          up vote
          3
          down vote









          Three moments don't determine a distributional form; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions.



          Sometimes even having all the moments isn't sufficient to determine a distribution. If the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (you could in principle do an inverse Laplace transform to obtain it).



          [If some moments are not finite this would mean the mgf doesn't exist, but there are also cases where all moments are finite but the mgf still doesn't exist in a neighborhood of 0.]




          How the process would work for a specific choice of distribution



          We can simplify the process of obtaining a distribution matching three moments by ignoring the mean and variance and working with a scaled third moment -- the moment-skewness ($gamma_1=mu_3/mu_2^3/2$).



          We can do this because having selected a distribution with the relevant skewness, we can then back out the desired mean and variance by scaling and shifting.



          Let's consider an example. Yesterday I created a large data set (which still happens to be in my R session) whose distribution I haven't tried to calculate the functional form of (it's a large set of values of the log of the sample variance of a Cauchy at n=10). We have the first three raw moments as
          1.519, 3.597 and 11.479 respectively, or correspondingly a mean of 1.518, a standard deviation* of 1.136 and a skewness of 1.429 (so these are sample values from a large sample).



          Formally, method of moments would attempt to match the raw moments, but the calculation is simpler if we start with the skewness (turning solving three equations in three unknowns into solving for one parameter at a time, a much simpler task).



          * I am going to handwave away the distinction between using an n-denominator on the variance - as would correspond to formal method of moments - and an n-1 denominator and simply use sample calculations.



          This skewness (~1.43) indicates we seek a distribution which is right-skew. I could choose, for example, a shifted lognormal distribution (three parameter lognormal, shape $sigma$, scale $mu$ and location-shift $gamma$) with the same moments. Let's begin by matching the skewness. The population skewness of a two parameter lognormal is:



          $gamma_1=(e^sigma ^2!!+2)sqrt e^sigma ^2!!-1$



          So let's start by equating that to the desired sample value to obtain an estimate of $sigma^2$, $tildesigma^2$, say.



          Note that $gamma_1^2$ is $(tau+2)^2(tau-1)$ where $tau=e^sigma^2$. This then yields a simple cubic equation $tau^3+3tau^2-4=gamma_1^2$. Using the sample skewness in that equation yields $tildetauapprox 1.1995$ or $tildesigma^2approx 0.1819$.



          We can then in turn solve for $mu$ by matching the variance (or standard deviation) and then for the location parameter by matching the mean.



          But we could as easily have chosen a shifted-gamma or a shifted-Weibull distribution (or a shifted-F or any number of other choices) and run through essentially the same process. Each of them would be different.



          (For the sample I was dealing with, a shifted gamma would probably have been a considerably better choice than a shifted lognormal, since the distribution of the logs of the values was left skew and the distribution of their cube root was very close to symmetric; these are consistent with what you see with unshifted gamma densities.)



          One could even take the skewness-kurtosis diagram in a Pearson plot and draw a line at the desired skewness and thereby obtain a two-point distribution, sequence of beta distributions, a gamma distribution, a sequence of beta-prime distributions, an inverse-gamma disribution and a sequence of Pearson type IV distributions all with the same skewness.



          We can see this illustrated in a skewness-kurtosis plot (Pearson plot) below (note that $beta_1=gamma_1^2$ and $beta_2$ is the kurtosis), with the regions for the various Pearson-distributions marked in.



          Pearson plot with desired skewness line drawn in



          The green horizontal line represents $gamma_1^2 = 2.042$, and we see it pass through each of the mentioned distribution-families, each corresponding to a different population kurtosis. (The dashed curve represents the lognormal, which is not a Pearson-family distribution)






          share|cite|improve this answer














          Three moments don't determine a distributional form; if you choose a distribution-famiy with three parameters which relate to the first three population moments, you can do moment matching ("method of moments") to estimate the three parameters and then generate values from such a distribution. There are many such distributions.



          Sometimes even having all the moments isn't sufficient to determine a distribution. If the moment generating function exists (in a neighborhood of 0) then it uniquely identifies a distribution (you could in principle do an inverse Laplace transform to obtain it).



          [If some moments are not finite this would mean the mgf doesn't exist, but there are also cases where all moments are finite but the mgf still doesn't exist in a neighborhood of 0.]




          How the process would work for a specific choice of distribution



          We can simplify the process of obtaining a distribution matching three moments by ignoring the mean and variance and working with a scaled third moment -- the moment-skewness ($gamma_1=mu_3/mu_2^3/2$).



          We can do this because having selected a distribution with the relevant skewness, we can then back out the desired mean and variance by scaling and shifting.



          Let's consider an example. Yesterday I created a large data set (which still happens to be in my R session) whose distribution I haven't tried to calculate the functional form of (it's a large set of values of the log of the sample variance of a Cauchy at n=10). We have the first three raw moments as
          1.519, 3.597 and 11.479 respectively, or correspondingly a mean of 1.518, a standard deviation* of 1.136 and a skewness of 1.429 (so these are sample values from a large sample).



          Formally, method of moments would attempt to match the raw moments, but the calculation is simpler if we start with the skewness (turning solving three equations in three unknowns into solving for one parameter at a time, a much simpler task).



          * I am going to handwave away the distinction between using an n-denominator on the variance - as would correspond to formal method of moments - and an n-1 denominator and simply use sample calculations.



          This skewness (~1.43) indicates we seek a distribution which is right-skew. I could choose, for example, a shifted lognormal distribution (three parameter lognormal, shape $sigma$, scale $mu$ and location-shift $gamma$) with the same moments. Let's begin by matching the skewness. The population skewness of a two parameter lognormal is:



          $gamma_1=(e^sigma ^2!!+2)sqrt e^sigma ^2!!-1$



          So let's start by equating that to the desired sample value to obtain an estimate of $sigma^2$, $tildesigma^2$, say.



          Note that $gamma_1^2$ is $(tau+2)^2(tau-1)$ where $tau=e^sigma^2$. This then yields a simple cubic equation $tau^3+3tau^2-4=gamma_1^2$. Using the sample skewness in that equation yields $tildetauapprox 1.1995$ or $tildesigma^2approx 0.1819$.



          We can then in turn solve for $mu$ by matching the variance (or standard deviation) and then for the location parameter by matching the mean.



          But we could as easily have chosen a shifted-gamma or a shifted-Weibull distribution (or a shifted-F or any number of other choices) and run through essentially the same process. Each of them would be different.



          (For the sample I was dealing with, a shifted gamma would probably have been a considerably better choice than a shifted lognormal, since the distribution of the logs of the values was left skew and the distribution of their cube root was very close to symmetric; these are consistent with what you see with unshifted gamma densities.)



          One could even take the skewness-kurtosis diagram in a Pearson plot and draw a line at the desired skewness and thereby obtain a two-point distribution, sequence of beta distributions, a gamma distribution, a sequence of beta-prime distributions, an inverse-gamma disribution and a sequence of Pearson type IV distributions all with the same skewness.



          We can see this illustrated in a skewness-kurtosis plot (Pearson plot) below (note that $beta_1=gamma_1^2$ and $beta_2$ is the kurtosis), with the regions for the various Pearson-distributions marked in.



          Pearson plot with desired skewness line drawn in



          The green horizontal line represents $gamma_1^2 = 2.042$, and we see it pass through each of the mentioned distribution-families, each corresponding to a different population kurtosis. (The dashed curve represents the lognormal, which is not a Pearson-family distribution)







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited 1 hour ago

























          answered 2 hours ago









          Glen_b♦

          205k22391718




          205k22391718






















              up vote
              1
              down vote













              So, the answer is generally NO, you can't do this, but sometimes you can.



              When you can't



              The reasons you can't do this usually are two folds.



              First, if you have N observations, then at most you can calculates N moments. What about the other moments? You can't simply set them to zero.



              Second, higher moments calculations become less and less precise, because you have to raise the numbers into higher powers. Consider 100th non-central moment, you can't usually calculate it with any precision: $$gamma_100=sum_ifracx_i^100 n$$



              When you can



              Now, sometimes you can get the distribution from moments. It's when you make an assumption about the distribution of some sort. For instance, you declare that it's normal. In this case all you need is just two moment, which can be calculated with decent precision, usually. Note, that normal distribution has higher moments, indeed, e.g. kurtosis, but we don't need them. If you were to calculate all moments of the normal distribution (without assuming it's normal), then tried to recover the characteristic function to sample from the distribution, it wouldn't work. However, when you forget about the higher moments and stick to the first two, it does work.






              share|cite|improve this answer
























                up vote
                1
                down vote













                So, the answer is generally NO, you can't do this, but sometimes you can.



                When you can't



                The reasons you can't do this usually are two folds.



                First, if you have N observations, then at most you can calculates N moments. What about the other moments? You can't simply set them to zero.



                Second, higher moments calculations become less and less precise, because you have to raise the numbers into higher powers. Consider 100th non-central moment, you can't usually calculate it with any precision: $$gamma_100=sum_ifracx_i^100 n$$



                When you can



                Now, sometimes you can get the distribution from moments. It's when you make an assumption about the distribution of some sort. For instance, you declare that it's normal. In this case all you need is just two moment, which can be calculated with decent precision, usually. Note, that normal distribution has higher moments, indeed, e.g. kurtosis, but we don't need them. If you were to calculate all moments of the normal distribution (without assuming it's normal), then tried to recover the characteristic function to sample from the distribution, it wouldn't work. However, when you forget about the higher moments and stick to the first two, it does work.






                share|cite|improve this answer






















                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  So, the answer is generally NO, you can't do this, but sometimes you can.



                  When you can't



                  The reasons you can't do this usually are two folds.



                  First, if you have N observations, then at most you can calculates N moments. What about the other moments? You can't simply set them to zero.



                  Second, higher moments calculations become less and less precise, because you have to raise the numbers into higher powers. Consider 100th non-central moment, you can't usually calculate it with any precision: $$gamma_100=sum_ifracx_i^100 n$$



                  When you can



                  Now, sometimes you can get the distribution from moments. It's when you make an assumption about the distribution of some sort. For instance, you declare that it's normal. In this case all you need is just two moment, which can be calculated with decent precision, usually. Note, that normal distribution has higher moments, indeed, e.g. kurtosis, but we don't need them. If you were to calculate all moments of the normal distribution (without assuming it's normal), then tried to recover the characteristic function to sample from the distribution, it wouldn't work. However, when you forget about the higher moments and stick to the first two, it does work.






                  share|cite|improve this answer












                  So, the answer is generally NO, you can't do this, but sometimes you can.



                  When you can't



                  The reasons you can't do this usually are two folds.



                  First, if you have N observations, then at most you can calculates N moments. What about the other moments? You can't simply set them to zero.



                  Second, higher moments calculations become less and less precise, because you have to raise the numbers into higher powers. Consider 100th non-central moment, you can't usually calculate it with any precision: $$gamma_100=sum_ifracx_i^100 n$$



                  When you can



                  Now, sometimes you can get the distribution from moments. It's when you make an assumption about the distribution of some sort. For instance, you declare that it's normal. In this case all you need is just two moment, which can be calculated with decent precision, usually. Note, that normal distribution has higher moments, indeed, e.g. kurtosis, but we don't need them. If you were to calculate all moments of the normal distribution (without assuming it's normal), then tried to recover the characteristic function to sample from the distribution, it wouldn't work. However, when you forget about the higher moments and stick to the first two, it does work.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered 1 hour ago









                  Aksakal

                  37.2k447108




                  37.2k447108




















                      curious_dan is a new contributor. Be nice, and check out our Code of Conduct.









                       

                      draft saved


                      draft discarded


















                      curious_dan is a new contributor. Be nice, and check out our Code of Conduct.












                      curious_dan is a new contributor. Be nice, and check out our Code of Conduct.











                      curious_dan is a new contributor. Be nice, and check out our Code of Conduct.













                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f375515%2fcan-i-use-moments-of-a-distribution-to-sample-the-distribution%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Displaying single band from multi-band raster using QGIS

                      How many registers does an x86_64 CPU actually have?