Correlation between a continous and integer variable
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
I would like to calculate the correlation between two variables. The first variable is continous and represents a performance measure. The second variable is an integer in the range from one to nine and represents an emotional rating of a user.
Currently, I'm using Spearmen correlation because neither the the first nor the second variable is normally distributed.
Is this correct? Or what correlation measure should be used for an integer variable?
correlation pearson-r spearman-rho
add a comment |Â
up vote
2
down vote
favorite
I would like to calculate the correlation between two variables. The first variable is continous and represents a performance measure. The second variable is an integer in the range from one to nine and represents an emotional rating of a user.
Currently, I'm using Spearmen correlation because neither the the first nor the second variable is normally distributed.
Is this correct? Or what correlation measure should be used for an integer variable?
correlation pearson-r spearman-rho
2
Keep in mind the data need not be Gaussian for correlation to be meaningful.
â dsaxton
Aug 15 at 16:45
The more crucial issue for whether or not to use Pearson correlation is not normality but linearity. Normality is really only needed if you apply a test that uses a normality assumption; you don't have to do that - you can use Pearson correlation without using a normality assumption. You could make some other parametric assumption than normality or you can even use Pearson correlation in a nonparametric test if you wish. If you want to pick up monotonic association, Spearman makes a lot of sense.
â Glen_bâ¦
Aug 16 at 3:15
@Glen_b what do you mean by linearity assumption by Pearson? Does spearmen not assume linearity?
â BlackHawk
Aug 16 at 12:21
Spearman doesn't measure linear association, it measuresthe strength of monotonic association (e.g. calculate the Spearman correlation of $x: 1, 2, 3, 4, 5$ and $y: 1, 10, 100, 1000, 10000$ -- which is very plainly not linear). Pearson is a measure of the strength of linear correlation (it's not the only possible measure of linear association but it's the most obvious one). Pearson correlation can only be 1 or -1 when the relationship is a perfect line.
â Glen_bâ¦
Aug 16 at 17:23
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I would like to calculate the correlation between two variables. The first variable is continous and represents a performance measure. The second variable is an integer in the range from one to nine and represents an emotional rating of a user.
Currently, I'm using Spearmen correlation because neither the the first nor the second variable is normally distributed.
Is this correct? Or what correlation measure should be used for an integer variable?
correlation pearson-r spearman-rho
I would like to calculate the correlation between two variables. The first variable is continous and represents a performance measure. The second variable is an integer in the range from one to nine and represents an emotional rating of a user.
Currently, I'm using Spearmen correlation because neither the the first nor the second variable is normally distributed.
Is this correct? Or what correlation measure should be used for an integer variable?
correlation pearson-r spearman-rho
correlation pearson-r spearman-rho
asked Aug 15 at 13:58
BlackHawk
727
727
2
Keep in mind the data need not be Gaussian for correlation to be meaningful.
â dsaxton
Aug 15 at 16:45
The more crucial issue for whether or not to use Pearson correlation is not normality but linearity. Normality is really only needed if you apply a test that uses a normality assumption; you don't have to do that - you can use Pearson correlation without using a normality assumption. You could make some other parametric assumption than normality or you can even use Pearson correlation in a nonparametric test if you wish. If you want to pick up monotonic association, Spearman makes a lot of sense.
â Glen_bâ¦
Aug 16 at 3:15
@Glen_b what do you mean by linearity assumption by Pearson? Does spearmen not assume linearity?
â BlackHawk
Aug 16 at 12:21
Spearman doesn't measure linear association, it measuresthe strength of monotonic association (e.g. calculate the Spearman correlation of $x: 1, 2, 3, 4, 5$ and $y: 1, 10, 100, 1000, 10000$ -- which is very plainly not linear). Pearson is a measure of the strength of linear correlation (it's not the only possible measure of linear association but it's the most obvious one). Pearson correlation can only be 1 or -1 when the relationship is a perfect line.
â Glen_bâ¦
Aug 16 at 17:23
add a comment |Â
2
Keep in mind the data need not be Gaussian for correlation to be meaningful.
â dsaxton
Aug 15 at 16:45
The more crucial issue for whether or not to use Pearson correlation is not normality but linearity. Normality is really only needed if you apply a test that uses a normality assumption; you don't have to do that - you can use Pearson correlation without using a normality assumption. You could make some other parametric assumption than normality or you can even use Pearson correlation in a nonparametric test if you wish. If you want to pick up monotonic association, Spearman makes a lot of sense.
â Glen_bâ¦
Aug 16 at 3:15
@Glen_b what do you mean by linearity assumption by Pearson? Does spearmen not assume linearity?
â BlackHawk
Aug 16 at 12:21
Spearman doesn't measure linear association, it measuresthe strength of monotonic association (e.g. calculate the Spearman correlation of $x: 1, 2, 3, 4, 5$ and $y: 1, 10, 100, 1000, 10000$ -- which is very plainly not linear). Pearson is a measure of the strength of linear correlation (it's not the only possible measure of linear association but it's the most obvious one). Pearson correlation can only be 1 or -1 when the relationship is a perfect line.
â Glen_bâ¦
Aug 16 at 17:23
2
2
Keep in mind the data need not be Gaussian for correlation to be meaningful.
â dsaxton
Aug 15 at 16:45
Keep in mind the data need not be Gaussian for correlation to be meaningful.
â dsaxton
Aug 15 at 16:45
The more crucial issue for whether or not to use Pearson correlation is not normality but linearity. Normality is really only needed if you apply a test that uses a normality assumption; you don't have to do that - you can use Pearson correlation without using a normality assumption. You could make some other parametric assumption than normality or you can even use Pearson correlation in a nonparametric test if you wish. If you want to pick up monotonic association, Spearman makes a lot of sense.
â Glen_bâ¦
Aug 16 at 3:15
The more crucial issue for whether or not to use Pearson correlation is not normality but linearity. Normality is really only needed if you apply a test that uses a normality assumption; you don't have to do that - you can use Pearson correlation without using a normality assumption. You could make some other parametric assumption than normality or you can even use Pearson correlation in a nonparametric test if you wish. If you want to pick up monotonic association, Spearman makes a lot of sense.
â Glen_bâ¦
Aug 16 at 3:15
@Glen_b what do you mean by linearity assumption by Pearson? Does spearmen not assume linearity?
â BlackHawk
Aug 16 at 12:21
@Glen_b what do you mean by linearity assumption by Pearson? Does spearmen not assume linearity?
â BlackHawk
Aug 16 at 12:21
Spearman doesn't measure linear association, it measuresthe strength of monotonic association (e.g. calculate the Spearman correlation of $x: 1, 2, 3, 4, 5$ and $y: 1, 10, 100, 1000, 10000$ -- which is very plainly not linear). Pearson is a measure of the strength of linear correlation (it's not the only possible measure of linear association but it's the most obvious one). Pearson correlation can only be 1 or -1 when the relationship is a perfect line.
â Glen_bâ¦
Aug 16 at 17:23
Spearman doesn't measure linear association, it measuresthe strength of monotonic association (e.g. calculate the Spearman correlation of $x: 1, 2, 3, 4, 5$ and $y: 1, 10, 100, 1000, 10000$ -- which is very plainly not linear). Pearson is a measure of the strength of linear correlation (it's not the only possible measure of linear association but it's the most obvious one). Pearson correlation can only be 1 or -1 when the relationship is a perfect line.
â Glen_bâ¦
Aug 16 at 17:23
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
6
down vote
accepted
Although your emotional rating data are coded as integers, are you sure that a difference between scores of 1 and 2 means the same as a difference between scores of 8 and 9 (etc.)? If not, what you have for those data are considered ordinal data, not strictly integer data. There are many threads on this site for how to analyze ordinal data.
With only 9 values for that ordinal variable there will be many ties, so Spearman's test might not be the best choice. One form of Kendall's test is designed to deal with ties.
You also might want to consider displaying the mean and standard error of your continuous variable over each of the 9 values of the emotional rating score. That could provide a useful visual display that might help convince anyone who is skeptical of your correlation results.
Normality and choice of tests
As others have noted in comments on this question, any of the Pearson, Spearman or Kendall correlation statistics might be useful here. All these statistics can be calculated regardless of the distributions of the data.
One issue is whether the assumptions needed to calculate p-values for the statistics are met. For example, the standard test for Pearson correlation assumes a bivariate normal distribution. That method of testing is what "requires" normality, you can still calculate the statistic without normality. Exact calculations of p-values typically can't be made for the other measures if there are ties. Yet even if the assumptions for tests aren't met, resampling can provide p-values and confidence intervals that are based on the empirical distributions rather than on assumed functional forms. A permutation test might be a good choice.
Kendall and Spearman statistics are based solely on the ranks of values along the two scales among the data points. As for other nonparametric tests they require no assumptions about distributions of the values. The tradeoff is that tests of those statistics might have less ability to detect true associations than the Pearson test if the assumptions of the Pearson test held.
So: if you have a linear relation between your performance measure and your emotional rating (that is, each step along the emotional rating scale leads to the same change in performance) then Pearson would be the best correlation measure. You could also consider standard linear regression of performance against emotional rating, which might provide more information. Then any requirement for normality in statistical tests would with respect to residuals about the means for each of the emotional rating groups (weaker than the assumption of bivariate normality), and you would have ways to gauge whether the linearity assumption is met.
For choosing between Kendall and Spearman, Kendall is essentially based on absolute values of differences while Spearman is based on squares of differences. Thus Spearman may be more affected by outliers in the relative rankings.
It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â BlackHawk
Aug 15 at 15:02
1
@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â EdM
Aug 15 at 15:31
By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â BlackHawk
Aug 16 at 12:22
Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â BlackHawk
Aug 16 at 12:39
@BlackHawk : I addressed your comments in an addition to the answer.
â EdM
Aug 16 at 18:14
 |Â
show 2 more comments
up vote
1
down vote
Yes Spearman Rank Correlation is the correct approach here. It is a non-parametric approach (solving the non-normality you describe) and is just the Pearson Correlation between the rank values of two variables. In other words in converts the variables of interests values to ranks and applies Pearson Correlation to the two sets of ranks.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
Although your emotional rating data are coded as integers, are you sure that a difference between scores of 1 and 2 means the same as a difference between scores of 8 and 9 (etc.)? If not, what you have for those data are considered ordinal data, not strictly integer data. There are many threads on this site for how to analyze ordinal data.
With only 9 values for that ordinal variable there will be many ties, so Spearman's test might not be the best choice. One form of Kendall's test is designed to deal with ties.
You also might want to consider displaying the mean and standard error of your continuous variable over each of the 9 values of the emotional rating score. That could provide a useful visual display that might help convince anyone who is skeptical of your correlation results.
Normality and choice of tests
As others have noted in comments on this question, any of the Pearson, Spearman or Kendall correlation statistics might be useful here. All these statistics can be calculated regardless of the distributions of the data.
One issue is whether the assumptions needed to calculate p-values for the statistics are met. For example, the standard test for Pearson correlation assumes a bivariate normal distribution. That method of testing is what "requires" normality, you can still calculate the statistic without normality. Exact calculations of p-values typically can't be made for the other measures if there are ties. Yet even if the assumptions for tests aren't met, resampling can provide p-values and confidence intervals that are based on the empirical distributions rather than on assumed functional forms. A permutation test might be a good choice.
Kendall and Spearman statistics are based solely on the ranks of values along the two scales among the data points. As for other nonparametric tests they require no assumptions about distributions of the values. The tradeoff is that tests of those statistics might have less ability to detect true associations than the Pearson test if the assumptions of the Pearson test held.
So: if you have a linear relation between your performance measure and your emotional rating (that is, each step along the emotional rating scale leads to the same change in performance) then Pearson would be the best correlation measure. You could also consider standard linear regression of performance against emotional rating, which might provide more information. Then any requirement for normality in statistical tests would with respect to residuals about the means for each of the emotional rating groups (weaker than the assumption of bivariate normality), and you would have ways to gauge whether the linearity assumption is met.
For choosing between Kendall and Spearman, Kendall is essentially based on absolute values of differences while Spearman is based on squares of differences. Thus Spearman may be more affected by outliers in the relative rankings.
It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â BlackHawk
Aug 15 at 15:02
1
@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â EdM
Aug 15 at 15:31
By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â BlackHawk
Aug 16 at 12:22
Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â BlackHawk
Aug 16 at 12:39
@BlackHawk : I addressed your comments in an addition to the answer.
â EdM
Aug 16 at 18:14
 |Â
show 2 more comments
up vote
6
down vote
accepted
Although your emotional rating data are coded as integers, are you sure that a difference between scores of 1 and 2 means the same as a difference between scores of 8 and 9 (etc.)? If not, what you have for those data are considered ordinal data, not strictly integer data. There are many threads on this site for how to analyze ordinal data.
With only 9 values for that ordinal variable there will be many ties, so Spearman's test might not be the best choice. One form of Kendall's test is designed to deal with ties.
You also might want to consider displaying the mean and standard error of your continuous variable over each of the 9 values of the emotional rating score. That could provide a useful visual display that might help convince anyone who is skeptical of your correlation results.
Normality and choice of tests
As others have noted in comments on this question, any of the Pearson, Spearman or Kendall correlation statistics might be useful here. All these statistics can be calculated regardless of the distributions of the data.
One issue is whether the assumptions needed to calculate p-values for the statistics are met. For example, the standard test for Pearson correlation assumes a bivariate normal distribution. That method of testing is what "requires" normality, you can still calculate the statistic without normality. Exact calculations of p-values typically can't be made for the other measures if there are ties. Yet even if the assumptions for tests aren't met, resampling can provide p-values and confidence intervals that are based on the empirical distributions rather than on assumed functional forms. A permutation test might be a good choice.
Kendall and Spearman statistics are based solely on the ranks of values along the two scales among the data points. As for other nonparametric tests they require no assumptions about distributions of the values. The tradeoff is that tests of those statistics might have less ability to detect true associations than the Pearson test if the assumptions of the Pearson test held.
So: if you have a linear relation between your performance measure and your emotional rating (that is, each step along the emotional rating scale leads to the same change in performance) then Pearson would be the best correlation measure. You could also consider standard linear regression of performance against emotional rating, which might provide more information. Then any requirement for normality in statistical tests would with respect to residuals about the means for each of the emotional rating groups (weaker than the assumption of bivariate normality), and you would have ways to gauge whether the linearity assumption is met.
For choosing between Kendall and Spearman, Kendall is essentially based on absolute values of differences while Spearman is based on squares of differences. Thus Spearman may be more affected by outliers in the relative rankings.
It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â BlackHawk
Aug 15 at 15:02
1
@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â EdM
Aug 15 at 15:31
By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â BlackHawk
Aug 16 at 12:22
Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â BlackHawk
Aug 16 at 12:39
@BlackHawk : I addressed your comments in an addition to the answer.
â EdM
Aug 16 at 18:14
 |Â
show 2 more comments
up vote
6
down vote
accepted
up vote
6
down vote
accepted
Although your emotional rating data are coded as integers, are you sure that a difference between scores of 1 and 2 means the same as a difference between scores of 8 and 9 (etc.)? If not, what you have for those data are considered ordinal data, not strictly integer data. There are many threads on this site for how to analyze ordinal data.
With only 9 values for that ordinal variable there will be many ties, so Spearman's test might not be the best choice. One form of Kendall's test is designed to deal with ties.
You also might want to consider displaying the mean and standard error of your continuous variable over each of the 9 values of the emotional rating score. That could provide a useful visual display that might help convince anyone who is skeptical of your correlation results.
Normality and choice of tests
As others have noted in comments on this question, any of the Pearson, Spearman or Kendall correlation statistics might be useful here. All these statistics can be calculated regardless of the distributions of the data.
One issue is whether the assumptions needed to calculate p-values for the statistics are met. For example, the standard test for Pearson correlation assumes a bivariate normal distribution. That method of testing is what "requires" normality, you can still calculate the statistic without normality. Exact calculations of p-values typically can't be made for the other measures if there are ties. Yet even if the assumptions for tests aren't met, resampling can provide p-values and confidence intervals that are based on the empirical distributions rather than on assumed functional forms. A permutation test might be a good choice.
Kendall and Spearman statistics are based solely on the ranks of values along the two scales among the data points. As for other nonparametric tests they require no assumptions about distributions of the values. The tradeoff is that tests of those statistics might have less ability to detect true associations than the Pearson test if the assumptions of the Pearson test held.
So: if you have a linear relation between your performance measure and your emotional rating (that is, each step along the emotional rating scale leads to the same change in performance) then Pearson would be the best correlation measure. You could also consider standard linear regression of performance against emotional rating, which might provide more information. Then any requirement for normality in statistical tests would with respect to residuals about the means for each of the emotional rating groups (weaker than the assumption of bivariate normality), and you would have ways to gauge whether the linearity assumption is met.
For choosing between Kendall and Spearman, Kendall is essentially based on absolute values of differences while Spearman is based on squares of differences. Thus Spearman may be more affected by outliers in the relative rankings.
Although your emotional rating data are coded as integers, are you sure that a difference between scores of 1 and 2 means the same as a difference between scores of 8 and 9 (etc.)? If not, what you have for those data are considered ordinal data, not strictly integer data. There are many threads on this site for how to analyze ordinal data.
With only 9 values for that ordinal variable there will be many ties, so Spearman's test might not be the best choice. One form of Kendall's test is designed to deal with ties.
You also might want to consider displaying the mean and standard error of your continuous variable over each of the 9 values of the emotional rating score. That could provide a useful visual display that might help convince anyone who is skeptical of your correlation results.
Normality and choice of tests
As others have noted in comments on this question, any of the Pearson, Spearman or Kendall correlation statistics might be useful here. All these statistics can be calculated regardless of the distributions of the data.
One issue is whether the assumptions needed to calculate p-values for the statistics are met. For example, the standard test for Pearson correlation assumes a bivariate normal distribution. That method of testing is what "requires" normality, you can still calculate the statistic without normality. Exact calculations of p-values typically can't be made for the other measures if there are ties. Yet even if the assumptions for tests aren't met, resampling can provide p-values and confidence intervals that are based on the empirical distributions rather than on assumed functional forms. A permutation test might be a good choice.
Kendall and Spearman statistics are based solely on the ranks of values along the two scales among the data points. As for other nonparametric tests they require no assumptions about distributions of the values. The tradeoff is that tests of those statistics might have less ability to detect true associations than the Pearson test if the assumptions of the Pearson test held.
So: if you have a linear relation between your performance measure and your emotional rating (that is, each step along the emotional rating scale leads to the same change in performance) then Pearson would be the best correlation measure. You could also consider standard linear regression of performance against emotional rating, which might provide more information. Then any requirement for normality in statistical tests would with respect to residuals about the means for each of the emotional rating groups (weaker than the assumption of bivariate normality), and you would have ways to gauge whether the linearity assumption is met.
For choosing between Kendall and Spearman, Kendall is essentially based on absolute values of differences while Spearman is based on squares of differences. Thus Spearman may be more affected by outliers in the relative rankings.
edited Aug 16 at 18:13
answered Aug 15 at 14:17
EdM
19.8k23388
19.8k23388
It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â BlackHawk
Aug 15 at 15:02
1
@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â EdM
Aug 15 at 15:31
By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â BlackHawk
Aug 16 at 12:22
Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â BlackHawk
Aug 16 at 12:39
@BlackHawk : I addressed your comments in an addition to the answer.
â EdM
Aug 16 at 18:14
 |Â
show 2 more comments
It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â BlackHawk
Aug 15 at 15:02
1
@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â EdM
Aug 15 at 15:31
By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â BlackHawk
Aug 16 at 12:22
Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â BlackHawk
Aug 16 at 12:39
@BlackHawk : I addressed your comments in an addition to the answer.
â EdM
Aug 16 at 18:14
It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â BlackHawk
Aug 15 at 15:02
It is the self assessment manikin and I think a difference between 1 and 2 means the same as between 8 and 9. Is Kendall's test still working?
â BlackHawk
Aug 15 at 15:02
1
1
@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â EdM
Aug 15 at 15:31
@BlackHawk yes, in that case the data would still be ordinal but in some circumstances you could then even consider treating them as continuous and using linear regression. See my link in the answer to discussions of ordinal data. Kendall's test will be fine for ordinal data provided that you use the version that handles ties. See my link in the answer to discussion of that test, or the Wikipedia page for the different flavors of the Kendall test.
â EdM
Aug 15 at 15:31
By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â BlackHawk
Aug 16 at 12:22
By the way, does Kendall's test work for normal as well as not normal distributed data? And has Kendall's test a linearity assumption (like Pearson) or monotonic assumption (like spearmen)? All in all, Kendall's test can replace Pearson and Spearmen, is that correct?
â BlackHawk
Aug 16 at 12:22
Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â BlackHawk
Aug 16 at 12:39
Or are there cases where one should prefer Pearson or Spearmen over Kendall?
â BlackHawk
Aug 16 at 12:39
@BlackHawk : I addressed your comments in an addition to the answer.
â EdM
Aug 16 at 18:14
@BlackHawk : I addressed your comments in an addition to the answer.
â EdM
Aug 16 at 18:14
 |Â
show 2 more comments
up vote
1
down vote
Yes Spearman Rank Correlation is the correct approach here. It is a non-parametric approach (solving the non-normality you describe) and is just the Pearson Correlation between the rank values of two variables. In other words in converts the variables of interests values to ranks and applies Pearson Correlation to the two sets of ranks.
add a comment |Â
up vote
1
down vote
Yes Spearman Rank Correlation is the correct approach here. It is a non-parametric approach (solving the non-normality you describe) and is just the Pearson Correlation between the rank values of two variables. In other words in converts the variables of interests values to ranks and applies Pearson Correlation to the two sets of ranks.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Yes Spearman Rank Correlation is the correct approach here. It is a non-parametric approach (solving the non-normality you describe) and is just the Pearson Correlation between the rank values of two variables. In other words in converts the variables of interests values to ranks and applies Pearson Correlation to the two sets of ranks.
Yes Spearman Rank Correlation is the correct approach here. It is a non-parametric approach (solving the non-normality you describe) and is just the Pearson Correlation between the rank values of two variables. In other words in converts the variables of interests values to ranks and applies Pearson Correlation to the two sets of ranks.
edited Aug 15 at 14:18
answered Aug 15 at 14:11
seraffej
366
366
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f362336%2fcorrelation-between-a-continous-and-integer-variable%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
Keep in mind the data need not be Gaussian for correlation to be meaningful.
â dsaxton
Aug 15 at 16:45
The more crucial issue for whether or not to use Pearson correlation is not normality but linearity. Normality is really only needed if you apply a test that uses a normality assumption; you don't have to do that - you can use Pearson correlation without using a normality assumption. You could make some other parametric assumption than normality or you can even use Pearson correlation in a nonparametric test if you wish. If you want to pick up monotonic association, Spearman makes a lot of sense.
â Glen_bâ¦
Aug 16 at 3:15
@Glen_b what do you mean by linearity assumption by Pearson? Does spearmen not assume linearity?
â BlackHawk
Aug 16 at 12:21
Spearman doesn't measure linear association, it measuresthe strength of monotonic association (e.g. calculate the Spearman correlation of $x: 1, 2, 3, 4, 5$ and $y: 1, 10, 100, 1000, 10000$ -- which is very plainly not linear). Pearson is a measure of the strength of linear correlation (it's not the only possible measure of linear association but it's the most obvious one). Pearson correlation can only be 1 or -1 when the relationship is a perfect line.
â Glen_bâ¦
Aug 16 at 17:23