Simple linear regression: If Y and X are both normal, what's the exact null distribution of the parameters?

Multi tool use
Clash Royale CLAN TAG#URR8PPP
Suppose $Y simN(a,b)$, $X simN(c,d)$, and $Y$ is independent of $X$. After sampling 25 observations from both $Y$ and $X$, I run the following regression model: $Y=beta_0+beta_1X + epsilon$. I wish to test the hypothesis $H_0: beta_0=0$ against the alternative $H_1: beta_0neq 0$.
My question is, since the distributions of $Y$ and $X$ are known, is there an exact 'null distribution' for the parameter $beta_0$? If so, what is the distribution? By null distribution, I mean the sampling distribution of $beta_0$ under the null hypothesis.
If anyone knows the answer assuming the true correlation coefficient between $Y$ and $X$ is 0.1, rather than assuming independence, that would be a big help also. This is all for a simulation study I'm working on.
regression inference
add a comment |
Suppose $Y simN(a,b)$, $X simN(c,d)$, and $Y$ is independent of $X$. After sampling 25 observations from both $Y$ and $X$, I run the following regression model: $Y=beta_0+beta_1X + epsilon$. I wish to test the hypothesis $H_0: beta_0=0$ against the alternative $H_1: beta_0neq 0$.
My question is, since the distributions of $Y$ and $X$ are known, is there an exact 'null distribution' for the parameter $beta_0$? If so, what is the distribution? By null distribution, I mean the sampling distribution of $beta_0$ under the null hypothesis.
If anyone knows the answer assuming the true correlation coefficient between $Y$ and $X$ is 0.1, rather than assuming independence, that would be a big help also. This is all for a simulation study I'm working on.
regression inference
6
I wonder whether you mean the distribution of $hat beta_0$ rather than of $beta_0$? You have specified that you are 100% sure that $beta_0 = 0$, so that is the rather degenerate distribution that it has! But it sounds to me that you might be rather more interested in the distribution of $hat beta_0$, which is the estimate of $beta_0$ that you would make from your random sample - and since different random samples will produce slightly different estimates, your estimator has a non-degenerate probability distribution
– Silverfish
Dec 23 '18 at 9:18
This question would be more interesting if you drop the independence assumption on $X$ and $Y$, and add an assumption on joint normal distribution.
– kjetil b halvorsen
Dec 23 '18 at 9:59
Yes, I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$), I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this distribution is normal. But since X and Y are both normal and n is relatively small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, rather than using the asymptotic approximation. The true value of the parameter is 0 (obviously), but this is not what I'm after!
– Anna Efron
Dec 24 '18 at 6:16
add a comment |
Suppose $Y simN(a,b)$, $X simN(c,d)$, and $Y$ is independent of $X$. After sampling 25 observations from both $Y$ and $X$, I run the following regression model: $Y=beta_0+beta_1X + epsilon$. I wish to test the hypothesis $H_0: beta_0=0$ against the alternative $H_1: beta_0neq 0$.
My question is, since the distributions of $Y$ and $X$ are known, is there an exact 'null distribution' for the parameter $beta_0$? If so, what is the distribution? By null distribution, I mean the sampling distribution of $beta_0$ under the null hypothesis.
If anyone knows the answer assuming the true correlation coefficient between $Y$ and $X$ is 0.1, rather than assuming independence, that would be a big help also. This is all for a simulation study I'm working on.
regression inference
Suppose $Y simN(a,b)$, $X simN(c,d)$, and $Y$ is independent of $X$. After sampling 25 observations from both $Y$ and $X$, I run the following regression model: $Y=beta_0+beta_1X + epsilon$. I wish to test the hypothesis $H_0: beta_0=0$ against the alternative $H_1: beta_0neq 0$.
My question is, since the distributions of $Y$ and $X$ are known, is there an exact 'null distribution' for the parameter $beta_0$? If so, what is the distribution? By null distribution, I mean the sampling distribution of $beta_0$ under the null hypothesis.
If anyone knows the answer assuming the true correlation coefficient between $Y$ and $X$ is 0.1, rather than assuming independence, that would be a big help also. This is all for a simulation study I'm working on.
regression inference
regression inference
edited Dec 23 '18 at 9:16


Silverfish
14.8k1564142
14.8k1564142
asked Dec 23 '18 at 4:56
Anna Efron
363
363
6
I wonder whether you mean the distribution of $hat beta_0$ rather than of $beta_0$? You have specified that you are 100% sure that $beta_0 = 0$, so that is the rather degenerate distribution that it has! But it sounds to me that you might be rather more interested in the distribution of $hat beta_0$, which is the estimate of $beta_0$ that you would make from your random sample - and since different random samples will produce slightly different estimates, your estimator has a non-degenerate probability distribution
– Silverfish
Dec 23 '18 at 9:18
This question would be more interesting if you drop the independence assumption on $X$ and $Y$, and add an assumption on joint normal distribution.
– kjetil b halvorsen
Dec 23 '18 at 9:59
Yes, I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$), I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this distribution is normal. But since X and Y are both normal and n is relatively small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, rather than using the asymptotic approximation. The true value of the parameter is 0 (obviously), but this is not what I'm after!
– Anna Efron
Dec 24 '18 at 6:16
add a comment |
6
I wonder whether you mean the distribution of $hat beta_0$ rather than of $beta_0$? You have specified that you are 100% sure that $beta_0 = 0$, so that is the rather degenerate distribution that it has! But it sounds to me that you might be rather more interested in the distribution of $hat beta_0$, which is the estimate of $beta_0$ that you would make from your random sample - and since different random samples will produce slightly different estimates, your estimator has a non-degenerate probability distribution
– Silverfish
Dec 23 '18 at 9:18
This question would be more interesting if you drop the independence assumption on $X$ and $Y$, and add an assumption on joint normal distribution.
– kjetil b halvorsen
Dec 23 '18 at 9:59
Yes, I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$), I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this distribution is normal. But since X and Y are both normal and n is relatively small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, rather than using the asymptotic approximation. The true value of the parameter is 0 (obviously), but this is not what I'm after!
– Anna Efron
Dec 24 '18 at 6:16
6
6
I wonder whether you mean the distribution of $hat beta_0$ rather than of $beta_0$? You have specified that you are 100% sure that $beta_0 = 0$, so that is the rather degenerate distribution that it has! But it sounds to me that you might be rather more interested in the distribution of $hat beta_0$, which is the estimate of $beta_0$ that you would make from your random sample - and since different random samples will produce slightly different estimates, your estimator has a non-degenerate probability distribution
– Silverfish
Dec 23 '18 at 9:18
I wonder whether you mean the distribution of $hat beta_0$ rather than of $beta_0$? You have specified that you are 100% sure that $beta_0 = 0$, so that is the rather degenerate distribution that it has! But it sounds to me that you might be rather more interested in the distribution of $hat beta_0$, which is the estimate of $beta_0$ that you would make from your random sample - and since different random samples will produce slightly different estimates, your estimator has a non-degenerate probability distribution
– Silverfish
Dec 23 '18 at 9:18
This question would be more interesting if you drop the independence assumption on $X$ and $Y$, and add an assumption on joint normal distribution.
– kjetil b halvorsen
Dec 23 '18 at 9:59
This question would be more interesting if you drop the independence assumption on $X$ and $Y$, and add an assumption on joint normal distribution.
– kjetil b halvorsen
Dec 23 '18 at 9:59
Yes, I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$), I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this distribution is normal. But since X and Y are both normal and n is relatively small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, rather than using the asymptotic approximation. The true value of the parameter is 0 (obviously), but this is not what I'm after!
– Anna Efron
Dec 24 '18 at 6:16
Yes, I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$), I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this distribution is normal. But since X and Y are both normal and n is relatively small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, rather than using the asymptotic approximation. The true value of the parameter is 0 (obviously), but this is not what I'm after!
– Anna Efron
Dec 24 '18 at 6:16
add a comment |
2 Answers
2
active
oldest
votes
Since you have specified that $X$ and $Y$ are independent, the conditional mean of $Y$ given $X$ is:
$$mathbbE(Y|X) = mathbbE(Y) = c,$$
which implies that:
$$beta_0 = c quad quad quad beta_1 = 0 quad quad quad varepsilon sim textN(0, d).$$
In this case there is nothing to test --- your regression parameters are fully determined by the distributional assumptions you have made at the start of the question.
Remember that a regression model is a model designed to describe the conditional distribution of $Y$ given $X$. If you assume independence of these variables then this pre-empts the entire modelling exercise.
Thank you. I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$) the usual way, I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this sampling distribution is normal. But since $X$ and $Y$ are both normal and $n$ is quite small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, s.t. the the coverage probability is exactly $(1-alpha)$? And what if $rho_XY=0.1$ (say) instead of 0?
– Anna Efron
Dec 24 '18 at 6:21
1
Once you remove the assumption that $X$ and $Y$ are independent, the regression model is your specification of their conditional relationship. Much of the information you have given in your comment unfortunately contradicts your original question. It is also unclear why you would test $H_0: beta_0 = 0$ if you know from some other source (your simulation) that $beta_0 = c$. I think at this point you will probably need to ask a new question where all this information is made clear.
– Ben
Dec 24 '18 at 6:44
add a comment |
In simple linear regression the computation of the estimate of $beta_0$ is:
$$hatbeta_0 = frac 1n S_y + frac 1n S_x frac n S_xy - S_x S_y n S_xx - S_x S_x$$
with $S_x = sum x_i $, $S_y = sum y_i $, $S_xx = sum x_i x_i $, $S_xy = sum x_i y_i $
You could say it will be a linear sum of the $y_i$
$$hatbeta_0 = frac 1 n sum c_i y_i $$
with
$$c_i =left( 1 + frac n x_i - S_xn S_xx - S_x S_x right) $$
This does not seem to follow an easy distribution (or at least not a typical well known distribution) for both random $x_i $ and $y_i$ you have:
$$hatbeta_0 sim N(mu, sigma^2)$$
where $mu$ and $sigma$ are random variables themselves depending on the distribution of $X$ as well. (if every $y_i$ has an identical distribution $N(a,b)$ then $mu = a$, independent from the distribution of $X$)
However if you condition on $x_i$ then $hatbeta_0$ follows a regular normal distribution (note that the $y_i$ do not need to be distributed according to identical Normal distributions) .
In testing you often do not know the variance of this normal distribution and you will estimate it based on the residuals. Then you will use the t-distribution.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f384254%2fsimple-linear-regression-if-y-and-x-are-both-normal-whats-the-exact-null-dist%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Since you have specified that $X$ and $Y$ are independent, the conditional mean of $Y$ given $X$ is:
$$mathbbE(Y|X) = mathbbE(Y) = c,$$
which implies that:
$$beta_0 = c quad quad quad beta_1 = 0 quad quad quad varepsilon sim textN(0, d).$$
In this case there is nothing to test --- your regression parameters are fully determined by the distributional assumptions you have made at the start of the question.
Remember that a regression model is a model designed to describe the conditional distribution of $Y$ given $X$. If you assume independence of these variables then this pre-empts the entire modelling exercise.
Thank you. I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$) the usual way, I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this sampling distribution is normal. But since $X$ and $Y$ are both normal and $n$ is quite small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, s.t. the the coverage probability is exactly $(1-alpha)$? And what if $rho_XY=0.1$ (say) instead of 0?
– Anna Efron
Dec 24 '18 at 6:21
1
Once you remove the assumption that $X$ and $Y$ are independent, the regression model is your specification of their conditional relationship. Much of the information you have given in your comment unfortunately contradicts your original question. It is also unclear why you would test $H_0: beta_0 = 0$ if you know from some other source (your simulation) that $beta_0 = c$. I think at this point you will probably need to ask a new question where all this information is made clear.
– Ben
Dec 24 '18 at 6:44
add a comment |
Since you have specified that $X$ and $Y$ are independent, the conditional mean of $Y$ given $X$ is:
$$mathbbE(Y|X) = mathbbE(Y) = c,$$
which implies that:
$$beta_0 = c quad quad quad beta_1 = 0 quad quad quad varepsilon sim textN(0, d).$$
In this case there is nothing to test --- your regression parameters are fully determined by the distributional assumptions you have made at the start of the question.
Remember that a regression model is a model designed to describe the conditional distribution of $Y$ given $X$. If you assume independence of these variables then this pre-empts the entire modelling exercise.
Thank you. I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$) the usual way, I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this sampling distribution is normal. But since $X$ and $Y$ are both normal and $n$ is quite small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, s.t. the the coverage probability is exactly $(1-alpha)$? And what if $rho_XY=0.1$ (say) instead of 0?
– Anna Efron
Dec 24 '18 at 6:21
1
Once you remove the assumption that $X$ and $Y$ are independent, the regression model is your specification of their conditional relationship. Much of the information you have given in your comment unfortunately contradicts your original question. It is also unclear why you would test $H_0: beta_0 = 0$ if you know from some other source (your simulation) that $beta_0 = c$. I think at this point you will probably need to ask a new question where all this information is made clear.
– Ben
Dec 24 '18 at 6:44
add a comment |
Since you have specified that $X$ and $Y$ are independent, the conditional mean of $Y$ given $X$ is:
$$mathbbE(Y|X) = mathbbE(Y) = c,$$
which implies that:
$$beta_0 = c quad quad quad beta_1 = 0 quad quad quad varepsilon sim textN(0, d).$$
In this case there is nothing to test --- your regression parameters are fully determined by the distributional assumptions you have made at the start of the question.
Remember that a regression model is a model designed to describe the conditional distribution of $Y$ given $X$. If you assume independence of these variables then this pre-empts the entire modelling exercise.
Since you have specified that $X$ and $Y$ are independent, the conditional mean of $Y$ given $X$ is:
$$mathbbE(Y|X) = mathbbE(Y) = c,$$
which implies that:
$$beta_0 = c quad quad quad beta_1 = 0 quad quad quad varepsilon sim textN(0, d).$$
In this case there is nothing to test --- your regression parameters are fully determined by the distributional assumptions you have made at the start of the question.
Remember that a regression model is a model designed to describe the conditional distribution of $Y$ given $X$. If you assume independence of these variables then this pre-empts the entire modelling exercise.
answered Dec 23 '18 at 7:05


Ben
21.9k224103
21.9k224103
Thank you. I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$) the usual way, I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this sampling distribution is normal. But since $X$ and $Y$ are both normal and $n$ is quite small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, s.t. the the coverage probability is exactly $(1-alpha)$? And what if $rho_XY=0.1$ (say) instead of 0?
– Anna Efron
Dec 24 '18 at 6:21
1
Once you remove the assumption that $X$ and $Y$ are independent, the regression model is your specification of their conditional relationship. Much of the information you have given in your comment unfortunately contradicts your original question. It is also unclear why you would test $H_0: beta_0 = 0$ if you know from some other source (your simulation) that $beta_0 = c$. I think at this point you will probably need to ask a new question where all this information is made clear.
– Ben
Dec 24 '18 at 6:44
add a comment |
Thank you. I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$) the usual way, I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this sampling distribution is normal. But since $X$ and $Y$ are both normal and $n$ is quite small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, s.t. the the coverage probability is exactly $(1-alpha)$? And what if $rho_XY=0.1$ (say) instead of 0?
– Anna Efron
Dec 24 '18 at 6:21
1
Once you remove the assumption that $X$ and $Y$ are independent, the regression model is your specification of their conditional relationship. Much of the information you have given in your comment unfortunately contradicts your original question. It is also unclear why you would test $H_0: beta_0 = 0$ if you know from some other source (your simulation) that $beta_0 = c$. I think at this point you will probably need to ask a new question where all this information is made clear.
– Ben
Dec 24 '18 at 6:44
Thank you. I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$) the usual way, I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this sampling distribution is normal. But since $X$ and $Y$ are both normal and $n$ is quite small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, s.t. the the coverage probability is exactly $(1-alpha)$? And what if $rho_XY=0.1$ (say) instead of 0?
– Anna Efron
Dec 24 '18 at 6:21
Thank you. I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$) the usual way, I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this sampling distribution is normal. But since $X$ and $Y$ are both normal and $n$ is quite small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, s.t. the the coverage probability is exactly $(1-alpha)$? And what if $rho_XY=0.1$ (say) instead of 0?
– Anna Efron
Dec 24 '18 at 6:21
1
1
Once you remove the assumption that $X$ and $Y$ are independent, the regression model is your specification of their conditional relationship. Much of the information you have given in your comment unfortunately contradicts your original question. It is also unclear why you would test $H_0: beta_0 = 0$ if you know from some other source (your simulation) that $beta_0 = c$. I think at this point you will probably need to ask a new question where all this information is made clear.
– Ben
Dec 24 '18 at 6:44
Once you remove the assumption that $X$ and $Y$ are independent, the regression model is your specification of their conditional relationship. Much of the information you have given in your comment unfortunately contradicts your original question. It is also unclear why you would test $H_0: beta_0 = 0$ if you know from some other source (your simulation) that $beta_0 = c$. I think at this point you will probably need to ask a new question where all this information is made clear.
– Ben
Dec 24 '18 at 6:44
add a comment |
In simple linear regression the computation of the estimate of $beta_0$ is:
$$hatbeta_0 = frac 1n S_y + frac 1n S_x frac n S_xy - S_x S_y n S_xx - S_x S_x$$
with $S_x = sum x_i $, $S_y = sum y_i $, $S_xx = sum x_i x_i $, $S_xy = sum x_i y_i $
You could say it will be a linear sum of the $y_i$
$$hatbeta_0 = frac 1 n sum c_i y_i $$
with
$$c_i =left( 1 + frac n x_i - S_xn S_xx - S_x S_x right) $$
This does not seem to follow an easy distribution (or at least not a typical well known distribution) for both random $x_i $ and $y_i$ you have:
$$hatbeta_0 sim N(mu, sigma^2)$$
where $mu$ and $sigma$ are random variables themselves depending on the distribution of $X$ as well. (if every $y_i$ has an identical distribution $N(a,b)$ then $mu = a$, independent from the distribution of $X$)
However if you condition on $x_i$ then $hatbeta_0$ follows a regular normal distribution (note that the $y_i$ do not need to be distributed according to identical Normal distributions) .
In testing you often do not know the variance of this normal distribution and you will estimate it based on the residuals. Then you will use the t-distribution.
add a comment |
In simple linear regression the computation of the estimate of $beta_0$ is:
$$hatbeta_0 = frac 1n S_y + frac 1n S_x frac n S_xy - S_x S_y n S_xx - S_x S_x$$
with $S_x = sum x_i $, $S_y = sum y_i $, $S_xx = sum x_i x_i $, $S_xy = sum x_i y_i $
You could say it will be a linear sum of the $y_i$
$$hatbeta_0 = frac 1 n sum c_i y_i $$
with
$$c_i =left( 1 + frac n x_i - S_xn S_xx - S_x S_x right) $$
This does not seem to follow an easy distribution (or at least not a typical well known distribution) for both random $x_i $ and $y_i$ you have:
$$hatbeta_0 sim N(mu, sigma^2)$$
where $mu$ and $sigma$ are random variables themselves depending on the distribution of $X$ as well. (if every $y_i$ has an identical distribution $N(a,b)$ then $mu = a$, independent from the distribution of $X$)
However if you condition on $x_i$ then $hatbeta_0$ follows a regular normal distribution (note that the $y_i$ do not need to be distributed according to identical Normal distributions) .
In testing you often do not know the variance of this normal distribution and you will estimate it based on the residuals. Then you will use the t-distribution.
add a comment |
In simple linear regression the computation of the estimate of $beta_0$ is:
$$hatbeta_0 = frac 1n S_y + frac 1n S_x frac n S_xy - S_x S_y n S_xx - S_x S_x$$
with $S_x = sum x_i $, $S_y = sum y_i $, $S_xx = sum x_i x_i $, $S_xy = sum x_i y_i $
You could say it will be a linear sum of the $y_i$
$$hatbeta_0 = frac 1 n sum c_i y_i $$
with
$$c_i =left( 1 + frac n x_i - S_xn S_xx - S_x S_x right) $$
This does not seem to follow an easy distribution (or at least not a typical well known distribution) for both random $x_i $ and $y_i$ you have:
$$hatbeta_0 sim N(mu, sigma^2)$$
where $mu$ and $sigma$ are random variables themselves depending on the distribution of $X$ as well. (if every $y_i$ has an identical distribution $N(a,b)$ then $mu = a$, independent from the distribution of $X$)
However if you condition on $x_i$ then $hatbeta_0$ follows a regular normal distribution (note that the $y_i$ do not need to be distributed according to identical Normal distributions) .
In testing you often do not know the variance of this normal distribution and you will estimate it based on the residuals. Then you will use the t-distribution.
In simple linear regression the computation of the estimate of $beta_0$ is:
$$hatbeta_0 = frac 1n S_y + frac 1n S_x frac n S_xy - S_x S_y n S_xx - S_x S_x$$
with $S_x = sum x_i $, $S_y = sum y_i $, $S_xx = sum x_i x_i $, $S_xy = sum x_i y_i $
You could say it will be a linear sum of the $y_i$
$$hatbeta_0 = frac 1 n sum c_i y_i $$
with
$$c_i =left( 1 + frac n x_i - S_xn S_xx - S_x S_x right) $$
This does not seem to follow an easy distribution (or at least not a typical well known distribution) for both random $x_i $ and $y_i$ you have:
$$hatbeta_0 sim N(mu, sigma^2)$$
where $mu$ and $sigma$ are random variables themselves depending on the distribution of $X$ as well. (if every $y_i$ has an identical distribution $N(a,b)$ then $mu = a$, independent from the distribution of $X$)
However if you condition on $x_i$ then $hatbeta_0$ follows a regular normal distribution (note that the $y_i$ do not need to be distributed according to identical Normal distributions) .
In testing you often do not know the variance of this normal distribution and you will estimate it based on the residuals. Then you will use the t-distribution.
edited Dec 25 '18 at 13:34
answered Dec 25 '18 at 13:08
Martijn Weterings
12.5k1355
12.5k1355
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f384254%2fsimple-linear-regression-if-y-and-x-are-both-normal-whats-the-exact-null-dist%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
6
I wonder whether you mean the distribution of $hat beta_0$ rather than of $beta_0$? You have specified that you are 100% sure that $beta_0 = 0$, so that is the rather degenerate distribution that it has! But it sounds to me that you might be rather more interested in the distribution of $hat beta_0$, which is the estimate of $beta_0$ that you would make from your random sample - and since different random samples will produce slightly different estimates, your estimator has a non-degenerate probability distribution
– Silverfish
Dec 23 '18 at 9:18
This question would be more interesting if you drop the independence assumption on $X$ and $Y$, and add an assumption on joint normal distribution.
– kjetil b halvorsen
Dec 23 '18 at 9:59
Yes, I meant that if I was to test $beta _0=0$ (for a simulation exercise I'm working on... I know the true value is $c$), I would have to generate the sampling distribution of $hatbeta _0$ under the null that $beta _0=0$. I know asymptotically this distribution is normal. But since X and Y are both normal and n is relatively small, am I able to use the t-distribution (for example) to form an 'exact' null distribution of $hatbeta _0$, rather than using the asymptotic approximation. The true value of the parameter is 0 (obviously), but this is not what I'm after!
– Anna Efron
Dec 24 '18 at 6:16