Correlation between a continous and integer variable

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
2
down vote

favorite

I would like to calculate the correlation between two variables. The first variable is continous and represents a performance measure. The second variable is an integer in the range from one to nine and represents an emotional rating of a user.

Currently, I'm using Spearmen correlation because neither the the first nor the second variable is normally distributed.

Is this correct? Or what correlation measure should be used for an integer variable?

asked Aug 15 at 13:58

BlackHawk

727

2

Keep in mind the data need not be Gaussian for correlation to be meaningful.
â€“Â dsaxton
Aug 15 at 16:45

The more crucial issue for whether or not to use Pearson correlation is not normality but linearity. Normality is really only needed if you apply a test that uses a normality assumption; you don't have to do that - you can use Pearson correlation without using a normality assumption. You could make some other parametric assumption than normality or you can even use Pearson correlation in a nonparametric test if you wish. If you want to pick up monotonic association, Spearman makes a lot of sense.
â€“Â Glen_bâ™¦
Aug 16 at 3:15

@Glen_b what do you mean by linearity assumption by Pearson? Does spearmen not assume linearity?
â€“Â BlackHawk
Aug 16 at 12:21

Spearman doesn't measure linear association, it measuresthe strength of monotonic association (e.g. calculate the Spearman correlation of $x: 1, 2, 3, 4, 5$ and $y: 1, 10, 100, 1000, 10000$ -- which is very plainly not linear). Pearson is a measure of the strength of linear correlation (it's not the only possible measure of linear association but it's the most obvious one). Pearson correlation can only be 1 or -1 when the relationship is a perfect line.
â€“Â Glen_bâ™¦
Aug 16 at 17:23

add a commentÂ |Â

up vote
2
down vote

favorite

Currently, I'm using Spearmen correlation because neither the the first nor the second variable is normally distributed.

Is this correct? Or what correlation measure should be used for an integer variable?

asked Aug 15 at 13:58

BlackHawk

727

2

Keep in mind the data need not be Gaussian for correlation to be meaningful.
â€“Â dsaxton
Aug 15 at 16:45

The more crucial issue for whether or not to use Pearson correlation is not normality but linearity. Normality is really only needed if you apply a test that uses a normality assumption; you don't have to do that - you can use Pearson correlation without using a normality assumption. You could make some other parametric assumption than normality or you can even use Pearson correlation in a nonparametric test if you wish. If you want to pick up monotonic association, Spearman makes a lot of sense.
â€“Â Glen_bâ™¦
Aug 16 at 3:15

@Glen_b what do you mean by linearity assumption by Pearson? Does spearmen not assume linearity?
â€“Â BlackHawk
Aug 16 at 12:21

Spearman doesn't measure linear association, it measuresthe strength of monotonic association (e.g. calculate the Spearman correlation of $x: 1, 2, 3, 4, 5$ and $y: 1, 10, 100, 1000, 10000$ -- which is very plainly not linear). Pearson is a measure of the strength of linear correlation (it's not the only possible measure of linear association but it's the most obvious one). Pearson correlation can only be 1 or -1 when the relationship is a perfect line.
â€“Â Glen_bâ™¦
Aug 16 at 17:23

add a commentÂ |Â

up vote
2
down vote

favorite

Currently, I'm using Spearmen correlation because neither the the first nor the second variable is normally distributed.

Is this correct? Or what correlation measure should be used for an integer variable?

asked Aug 15 at 13:58

BlackHawk

727

Currently, I'm using Spearmen correlation because neither the the first nor the second variable is normally distributed.

Is this correct? Or what correlation measure should be used for an integer variable?

correlation pearson-r spearman-rho

asked Aug 15 at 13:58

BlackHawk

727

asked Aug 15 at 13:58

BlackHawk

727

asked Aug 15 at 13:58

BlackHawk

727

asked Aug 15 at 13:58

BlackHawk

727

asked Aug 15 at 13:58

BlackHawk

727

2

Keep in mind the data need not be Gaussian for correlation to be meaningful.
â€“Â dsaxton
Aug 15 at 16:45

The more crucial issue for whether or not to use Pearson correlation is not normality but linearity. Normality is really only needed if you apply a test that uses a normality assumption; you don't have to do that - you can use Pearson correlation without using a normality assumption. You could make some other parametric assumption than normality or you can even use Pearson correlation in a nonparametric test if you wish. If you want to pick up monotonic association, Spearman makes a lot of sense.
â€“Â Glen_bâ™¦
Aug 16 at 3:15

@Glen_b what do you mean by linearity assumption by Pearson? Does spearmen not assume linearity?
â€“Â BlackHawk
Aug 16 at 12:21

Spearman doesn't measure linear association, it measuresthe strength of monotonic association (e.g. calculate the Spearman correlation of $x: 1, 2, 3, 4, 5$ and $y: 1, 10, 100, 1000, 10000$ -- which is very plainly not linear). Pearson is a measure of the strength of linear correlation (it's not the only possible measure of linear association but it's the most obvious one). Pearson correlation can only be 1 or -1 when the relationship is a perfect line.
â€“Â Glen_bâ™¦
Aug 16 at 17:23

add a commentÂ |Â

2

Keep in mind the data need not be Gaussian for correlation to be meaningful.
â€“Â dsaxton
Aug 15 at 16:45

The more crucial issue for whether or not to use Pearson correlation is not normality but linearity. Normality is really only needed if you apply a test that uses a normality assumption; you don't have to do that - you can use Pearson correlation without using a normality assumption. You could make some other parametric assumption than normality or you can even use Pearson correlation in a nonparametric test if you wish. If you want to pick up monotonic association, Spearman makes a lot of sense.
â€“Â Glen_bâ™¦
Aug 16 at 3:15

@Glen_b what do you mean by linearity assumption by Pearson? Does spearmen not assume linearity?
â€“Â BlackHawk
Aug 16 at 12:21

Spearman doesn't measure linear association, it measuresthe strength of monotonic association (e.g. calculate the Spearman correlation of $x: 1, 2, 3, 4, 5$ and $y: 1, 10, 100, 1000, 10000$ -- which is very plainly not linear). Pearson is a measure of the strength of linear correlation (it's not the only possible measure of linear association but it's the most obvious one). Pearson correlation can only be 1 or -1 when the relationship is a perfect line.
â€“Â Glen_bâ™¦
Aug 16 at 17:23

Keep in mind the data need not be Gaussian for correlation to be meaningful.
â€“Â dsaxton
Aug 15 at 16:45

The more crucial issue for whether or not to use Pearson correlation is not normality but linearity. Normality is really only needed if you apply a test that uses a normality assumption; you don't have to do that - you can use Pearson correlation without using a normality assumption. You could make some other parametric assumption than normality or you can even use Pearson correlation in a nonparametric test if you wish. If you want to pick up monotonic association, Spearman makes a lot of sense.
â€“Â Glen_bâ™¦
Aug 16 at 3:15

@Glen_b what do you mean by linearity assumption by Pearson? Does spearmen not assume linearity?
â€“Â BlackHawk
Aug 16 at 12:21

Spearman doesn't measure linear association, it measuresthe strength of monotonic association (e.g. calculate the Spearman correlation of $x: 1, 2, 3, 4, 5$ and $y: 1, 10, 100, 1000, 10000$ -- which is very plainly not linear). Pearson is a measure of the strength of linear correlation (it's not the only possible measure of linear association but it's the most obvious one). Pearson correlation can only be 1 or -1 when the relationship is a perfect line.
â€“Â Glen_bâ™¦
Aug 16 at 17:23

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
6
down vote

accepted

Although your emotional rating data are coded as integers, are you sure that a difference between scores of 1 and 2 means the same as a difference between scores of 8 and 9 (etc.)? If not, what you have for those data are considered ordinal data, not strictly integer data. There are many threads on this site for how to analyze ordinal data.

With only 9 values for that ordinal variable there will be many ties, so Spearman's test might not be the best choice. One form of Kendall's test is designed to deal with ties.

You also might want to consider displaying the mean and standard error of your continuous variable over each of the 9 values of the emotional rating score. That could provide a useful visual display that might help convince anyone who is skeptical of your correlation results.

Normality and choice of tests

As others have noted in comments on this question, any of the Pearson, Spearman or Kendall correlation statistics might be useful here. All these statistics can be calculated regardless of the distributions of the data.

One issue is whether the assumptions needed to calculate p-values for the statistics are met. For example, the standard test for Pearson correlation assumes a bivariate normal distribution. That method of testing is what "requires" normality, you can still calculate the statistic without normality. Exact calculations of p-values typically can't be made for the other measures if there are ties. Yet even if the assumptions for tests aren't met, resampling can provide p-values and confidence intervals that are based on the empirical distributions rather than on assumed functional forms. A permutation test might be a good choice.

Kendall and Spearman statistics are based solely on the ranks of values along the two scales among the data points. As for other nonparametric tests they require no assumptions about distributions of the values. The tradeoff is that tests of those statistics might have less ability to detect true associations than the Pearson test if the assumptions of the Pearson test held.

So: if you have a linear relation between your performance measure and your emotional rating (that is, each step along the emotional rating scale leads to the same change in performance) then Pearson would be the best correlation measure. You could also consider standard linear regression of performance against emotional rating, which might provide more information. Then any requirement for normality in statistical tests would with respect to residuals about the means for each of the emotional rating groups (weaker than the assumption of bivariate normality), and you would have ways to gauge whether the linearity assumption is met.

For choosing between Kendall and Spearman, Kendall is essentially based on absolute values of differences while Spearman is based on squares of differences. Thus Spearman may be more affected by outliers in the relative rankings.

edited Aug 16 at 18:13

answered Aug 15 at 14:17

EdM

19.8k23388

It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â€“Â BlackHawk
Aug 15 at 15:02

1

@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â€“Â EdM
Aug 15 at 15:31

By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â€“Â BlackHawk
Aug 16 at 12:22

Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â€“Â BlackHawk
Aug 16 at 12:39

@BlackHawk : I addressed your comments in an addition to the answer.
â€“Â EdM
Aug 16 at 18:14

Â |Â
show 2 more comments

up vote
1
down vote

Yes Spearman Rank Correlation is the correct approach here. It is a non-parametric approach (solving the non-normality you describe) and is just the Pearson Correlation between the rank values of two variables. In other words in converts the variables of interests values to ranks and applies Pearson Correlation to the two sets of ranks.

edited Aug 15 at 14:18

answered Aug 15 at 14:11

seraffej

366

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f362336%2fcorrelation-between-a-continous-and-integer-variable%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
6
down vote

accepted

With only 9 values for that ordinal variable there will be many ties, so Spearman's test might not be the best choice. One form of Kendall's test is designed to deal with ties.

Normality and choice of tests

edited Aug 16 at 18:13

answered Aug 15 at 14:17

EdM

19.8k23388

It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â€“Â BlackHawk
Aug 15 at 15:02

1

@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â€“Â EdM
Aug 15 at 15:31

By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â€“Â BlackHawk
Aug 16 at 12:22

Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â€“Â BlackHawk
Aug 16 at 12:39

@BlackHawk : I addressed your comments in an addition to the answer.
â€“Â EdM
Aug 16 at 18:14

Â |Â
show 2 more comments

up vote
6
down vote

accepted

With only 9 values for that ordinal variable there will be many ties, so Spearman's test might not be the best choice. One form of Kendall's test is designed to deal with ties.

Normality and choice of tests

edited Aug 16 at 18:13

answered Aug 15 at 14:17

EdM

19.8k23388

It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â€“Â BlackHawk
Aug 15 at 15:02

1

@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â€“Â EdM
Aug 15 at 15:31

By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â€“Â BlackHawk
Aug 16 at 12:22

Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â€“Â BlackHawk
Aug 16 at 12:39

@BlackHawk : I addressed your comments in an addition to the answer.
â€“Â EdM
Aug 16 at 18:14

Â |Â
show 2 more comments

up vote
6
down vote

accepted

With only 9 values for that ordinal variable there will be many ties, so Spearman's test might not be the best choice. One form of Kendall's test is designed to deal with ties.

Normality and choice of tests

edited Aug 16 at 18:13

answered Aug 15 at 14:17

EdM

19.8k23388

With only 9 values for that ordinal variable there will be many ties, so Spearman's test might not be the best choice. One form of Kendall's test is designed to deal with ties.

Normality and choice of tests

edited Aug 16 at 18:13

answered Aug 15 at 14:17

EdM

19.8k23388

edited Aug 16 at 18:13

answered Aug 15 at 14:17

EdM

19.8k23388

answered Aug 15 at 14:17

EdM

19.8k23388

answered Aug 15 at 14:17

EdM

19.8k23388

It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â€“Â BlackHawk
Aug 15 at 15:02

1

@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â€“Â EdM
Aug 15 at 15:31

By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â€“Â BlackHawk
Aug 16 at 12:22

Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â€“Â BlackHawk
Aug 16 at 12:39

@BlackHawk : I addressed your comments in an addition to the answer.
â€“Â EdM
Aug 16 at 18:14

Â |Â
show 2 more comments

It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â€“Â BlackHawk
Aug 15 at 15:02

1

@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â€“Â EdM
Aug 15 at 15:31

By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â€“Â BlackHawk
Aug 16 at 12:22

Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â€“Â BlackHawk
Aug 16 at 12:39

@BlackHawk : I addressed your comments in an addition to the answer.
â€“Â EdM
Aug 16 at 18:14

It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â€“Â BlackHawk
Aug 15 at 15:02

@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â€“Â EdM
Aug 15 at 15:31

By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â€“Â BlackHawk
Aug 16 at 12:22

Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â€“Â BlackHawk
Aug 16 at 12:39

@BlackHawk : I addressed your comments in an addition to the answer.
â€“Â EdM
Aug 16 at 18:14

Â |Â
show 2 more comments

up vote
1
down vote

edited Aug 15 at 14:18

answered Aug 15 at 14:11

seraffej

366

add a commentÂ |Â

up vote
1
down vote

edited Aug 15 at 14:18

answered Aug 15 at 14:11

seraffej

366

add a commentÂ |Â

up vote
1
down vote

edited Aug 15 at 14:18

answered Aug 15 at 14:11

seraffej

366

edited Aug 15 at 14:18

answered Aug 15 at 14:11

seraffej

366

edited Aug 15 at 14:18

answered Aug 15 at 14:11

seraffej

366

answered Aug 15 at 14:11

seraffej

366

answered Aug 15 at 14:11

seraffej

366

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu