Which corrections should I use? T-test for differences in means with different sample sizes and standard deviations

I have two samples, coming from different populations.
One sample has 8,000 records, a mean of 5 and a sd of 0.5
The second has 1,500 records, a mean of 7 and a sd of 1.5
The distributions are close to normal.

This is coming from the behaviour of two kind of devices, and I want to understand if the output of one is of higher quality than the other.

Can I apply a $t$-test here? What cautions should I have or which corrections/alternative test do I have?

edited Feb 17 at 16:26

StatsStudent

6,01332044

asked Feb 16 at 14:00

Luis

85119

add a comment |

This is coming from the behaviour of two kind of devices, and I want to understand if the output of one is of higher quality than the other.

Can I apply a $t$-test here? What cautions should I have or which corrections/alternative test do I have?

edited Feb 17 at 16:26

StatsStudent

6,01332044

asked Feb 16 at 14:00

Luis

85119

add a comment |

This is coming from the behaviour of two kind of devices, and I want to understand if the output of one is of higher quality than the other.

Can I apply a $t$-test here? What cautions should I have or which corrections/alternative test do I have?

edited Feb 17 at 16:26

StatsStudent

6,01332044

asked Feb 16 at 14:00

Luis

85119

This is coming from the behaviour of two kind of devices, and I want to understand if the output of one is of higher quality than the other.

Can I apply a $t$-test here? What cautions should I have or which corrections/alternative test do I have?

statistical-significance t-test inference

edited Feb 17 at 16:26

StatsStudent

6,01332044

asked Feb 16 at 14:00

Luis

85119

edited Feb 17 at 16:26

StatsStudent

6,01332044

asked Feb 16 at 14:00

Luis

85119

edited Feb 17 at 16:26

StatsStudent

6,01332044

edited Feb 17 at 16:26

StatsStudent

6,01332044

edited Feb 17 at 16:26

StatsStudent

6,01332044

asked Feb 16 at 14:00

Luis

85119

asked Feb 16 at 14:00

Luis

85119

asked Feb 16 at 14:00

Luis

85119

add a comment |

2 Answers
2

active

oldest

votes

Assuming your samples are independent, then Welch's t-test does seem to be appropriate here, since it appears you have unequal variances (but you can formally test this too if you want through Levene's Test for Equality of Variances).

That being said, since you have quite large samples from both device 1 and device 2, then you can appeal to the central limit theorem and use:

begineqnarray*
Z & = & fracbarX-barYsqrtfracs_1^2n_1+fracs_2^2n_2sim N(0,1)\
endeqnarray*

under the null hypothesis of equal means. Here, $barX$ and $barY$ and sample means from device 1 and device 2, respectively and $s_i^2$ and $n_i$ are the sample variance and sample sizes from the ith device $i=1,2$. Note that in large sample inference, you don't need to concern yourself with unequal variances.

Then a 95% confidence interval for your estimate would be given by:

begineqnarray*
barX-barY & pm & Z_alpha/2sqrtfracs_1^2n_1+fracs_2^2n_2
endeqnarray*

where $Z_alpha/2$ is the upper $alpha/2$ point of the standard normal distribution.

All this being said, I wholeheartedly agree with the answer provided by Stefan. These sample sizes are really large and he's provided sound advice that you should follow. You should focus on what is an important practical difference. Is a 0.0001 mean difference between device 1 and device 2 important to you? Is it still important if device 1 costs three times as much as device 2?

edited Feb 16 at 15:37

answered Feb 16 at 15:13

StatsStudent

6,01332044

add a comment |

With such a huge sample size almost any slight differences in those two means will be declared significant. Instead, I would try to visualize your samples in different ways to learn more about the shape of the data.

Also how is "higher quality" defined by you? Does it mean that the mean outcomes should be different? Or does it perhaps apply more to the variances between the samples, e.g. less variation more desirable?

Here are some ideas how to visualize the data using R:

require(ggplot2)
require(gridExtra)

d1 <- data.frame(Y = rnorm(8000, 5, 0.5), X = "A")
d2 <- data.frame(Y = rnorm(1500, 7, 1.5), X = "B")
d <- rbind(d1, d2)

p1 <- ggplot(d, aes(Y, group = X)) + geom_density() + ggtitle("Density plot")
p2 <- ggplot(d, aes(X, Y)) + geom_boxplot() + ggtitle("Boxplot")
p3 <- ggplot(d, aes(X, Y)) + geom_violin() + ggtitle("Violin plot")

grid.arrange(p1, p2, p3, ncol = 1)

enter image description here

edited Feb 19 at 14:53

answered Feb 16 at 15:20

Stefan

3,5811931

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f392823%2fwhich-corrections-should-i-use-t-test-for-differences-in-means-with-different-s%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

That being said, since you have quite large samples from both device 1 and device 2, then you can appeal to the central limit theorem and use:

begineqnarray*
Z & = & fracbarX-barYsqrtfracs_1^2n_1+fracs_2^2n_2sim N(0,1)\
endeqnarray*

Then a 95% confidence interval for your estimate would be given by:

begineqnarray*
barX-barY & pm & Z_alpha/2sqrtfracs_1^2n_1+fracs_2^2n_2
endeqnarray*

where $Z_alpha/2$ is the upper $alpha/2$ point of the standard normal distribution.

edited Feb 16 at 15:37

answered Feb 16 at 15:13

StatsStudent

6,01332044

add a comment |

That being said, since you have quite large samples from both device 1 and device 2, then you can appeal to the central limit theorem and use:

begineqnarray*
Z & = & fracbarX-barYsqrtfracs_1^2n_1+fracs_2^2n_2sim N(0,1)\
endeqnarray*

Then a 95% confidence interval for your estimate would be given by:

begineqnarray*
barX-barY & pm & Z_alpha/2sqrtfracs_1^2n_1+fracs_2^2n_2
endeqnarray*

where $Z_alpha/2$ is the upper $alpha/2$ point of the standard normal distribution.

edited Feb 16 at 15:37

answered Feb 16 at 15:13

StatsStudent

6,01332044

add a comment |

That being said, since you have quite large samples from both device 1 and device 2, then you can appeal to the central limit theorem and use:

begineqnarray*
Z & = & fracbarX-barYsqrtfracs_1^2n_1+fracs_2^2n_2sim N(0,1)\
endeqnarray*

Then a 95% confidence interval for your estimate would be given by:

begineqnarray*
barX-barY & pm & Z_alpha/2sqrtfracs_1^2n_1+fracs_2^2n_2
endeqnarray*

where $Z_alpha/2$ is the upper $alpha/2$ point of the standard normal distribution.

edited Feb 16 at 15:37

answered Feb 16 at 15:13

StatsStudent

6,01332044

That being said, since you have quite large samples from both device 1 and device 2, then you can appeal to the central limit theorem and use:

begineqnarray*
Z & = & fracbarX-barYsqrtfracs_1^2n_1+fracs_2^2n_2sim N(0,1)\
endeqnarray*

Then a 95% confidence interval for your estimate would be given by:

begineqnarray*
barX-barY & pm & Z_alpha/2sqrtfracs_1^2n_1+fracs_2^2n_2
endeqnarray*

where $Z_alpha/2$ is the upper $alpha/2$ point of the standard normal distribution.

edited Feb 16 at 15:37

answered Feb 16 at 15:13

StatsStudent

6,01332044

edited Feb 16 at 15:37

answered Feb 16 at 15:13

StatsStudent

6,01332044

answered Feb 16 at 15:13

StatsStudent

6,01332044

answered Feb 16 at 15:13

StatsStudent

6,01332044

add a comment |

Here are some ideas how to visualize the data using R:

require(ggplot2)
require(gridExtra)

d1 <- data.frame(Y = rnorm(8000, 5, 0.5), X = "A")
d2 <- data.frame(Y = rnorm(1500, 7, 1.5), X = "B")
d <- rbind(d1, d2)

p1 <- ggplot(d, aes(Y, group = X)) + geom_density() + ggtitle("Density plot")
p2 <- ggplot(d, aes(X, Y)) + geom_boxplot() + ggtitle("Boxplot")
p3 <- ggplot(d, aes(X, Y)) + geom_violin() + ggtitle("Violin plot")

grid.arrange(p1, p2, p3, ncol = 1)

enter image description here

edited Feb 19 at 14:53

answered Feb 16 at 15:20

Stefan

3,5811931

add a comment |

Here are some ideas how to visualize the data using R:

require(ggplot2)
require(gridExtra)

d1 <- data.frame(Y = rnorm(8000, 5, 0.5), X = "A")
d2 <- data.frame(Y = rnorm(1500, 7, 1.5), X = "B")
d <- rbind(d1, d2)

p1 <- ggplot(d, aes(Y, group = X)) + geom_density() + ggtitle("Density plot")
p2 <- ggplot(d, aes(X, Y)) + geom_boxplot() + ggtitle("Boxplot")
p3 <- ggplot(d, aes(X, Y)) + geom_violin() + ggtitle("Violin plot")

grid.arrange(p1, p2, p3, ncol = 1)

enter image description here

edited Feb 19 at 14:53

answered Feb 16 at 15:20

Stefan

3,5811931

add a comment |

Here are some ideas how to visualize the data using R:

require(ggplot2)
require(gridExtra)

d1 <- data.frame(Y = rnorm(8000, 5, 0.5), X = "A")
d2 <- data.frame(Y = rnorm(1500, 7, 1.5), X = "B")
d <- rbind(d1, d2)

p1 <- ggplot(d, aes(Y, group = X)) + geom_density() + ggtitle("Density plot")
p2 <- ggplot(d, aes(X, Y)) + geom_boxplot() + ggtitle("Boxplot")
p3 <- ggplot(d, aes(X, Y)) + geom_violin() + ggtitle("Violin plot")

grid.arrange(p1, p2, p3, ncol = 1)

enter image description here

edited Feb 19 at 14:53

answered Feb 16 at 15:20

Stefan

3,5811931

Here are some ideas how to visualize the data using R:

require(ggplot2)
require(gridExtra)

d1 <- data.frame(Y = rnorm(8000, 5, 0.5), X = "A")
d2 <- data.frame(Y = rnorm(1500, 7, 1.5), X = "B")
d <- rbind(d1, d2)

p1 <- ggplot(d, aes(Y, group = X)) + geom_density() + ggtitle("Density plot")
p2 <- ggplot(d, aes(X, Y)) + geom_boxplot() + ggtitle("Boxplot")
p3 <- ggplot(d, aes(X, Y)) + geom_violin() + ggtitle("Violin plot")

grid.arrange(p1, p2, p3, ncol = 1)

enter image description here

edited Feb 19 at 14:53

answered Feb 16 at 15:20

Stefan

3,5811931

edited Feb 19 at 14:53

answered Feb 16 at 15:20

Stefan

3,5811931

answered Feb 16 at 15:20

Stefan

3,5811931

answered Feb 16 at 15:20

Stefan

3,5811931

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu