Ways of Testing Linearity Assumption in Multiple Regression apart from Residual Plots

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite












I was going through the assumptions of linear regression and of course one of them was linearity between the dependent and the independent variables - to be precise I should say that the assumption is the conditional mean of $Y_i$ given $X_i$ is linear in the parameters.



I looked in many textbooks and resources online and all of them suggested to check that assumption through a scatter plot of the residuals versus the fitted values. Although I can see that this is a valid and helpful way, I can't help but notice that it can be a bit arbitrary and subjective in some cases.



My question is if there is a statistical test to examine that assumption as well. For example when testing heteroscedasticity we can see the residual plot but we also have Levene's test.



I can see in that in How can I use the value of $R^2$ to test the linearity assumption in multiple regression analysis? ,which is very helpful, it stated the R squared is not that statistic but doesn't mention anything as a viable alternative.



Thanks in advance










share|cite|improve this question



















  • 1




    One cannot get away from arbitrariness and subjectivity with data analysis. Take a statistical test, what justification is there for selecting a specific alpha level? Or take the answer you have there by @DimitriRizopoulos (which I +1), there are other methods to specify non linear relationships; why not those? For one, everyone is limited in their knowledge of all the available options, adding subjectivity to the process.
    – Heteroskedastic Jim
    2 days ago







  • 1




    I agree completely - I also like that you brought the alpha level as an example as I often use it in similar arguments. Of course I am not trying to find a perfect solution of a magical metric/statistical test that I can feed my data into and it will give me the answer without any doubt. In fact that was why I posted the question as I wanted to expand my knowledge on alternatives to the residual plot, trying to minimise my subjectivity in the matter. Thanks!
    – ALEX.VAMVAS
    2 days ago











  • I see, awareness of alternatives can be helpful.
    – Heteroskedastic Jim
    2 days ago
















up vote
3
down vote

favorite












I was going through the assumptions of linear regression and of course one of them was linearity between the dependent and the independent variables - to be precise I should say that the assumption is the conditional mean of $Y_i$ given $X_i$ is linear in the parameters.



I looked in many textbooks and resources online and all of them suggested to check that assumption through a scatter plot of the residuals versus the fitted values. Although I can see that this is a valid and helpful way, I can't help but notice that it can be a bit arbitrary and subjective in some cases.



My question is if there is a statistical test to examine that assumption as well. For example when testing heteroscedasticity we can see the residual plot but we also have Levene's test.



I can see in that in How can I use the value of $R^2$ to test the linearity assumption in multiple regression analysis? ,which is very helpful, it stated the R squared is not that statistic but doesn't mention anything as a viable alternative.



Thanks in advance










share|cite|improve this question



















  • 1




    One cannot get away from arbitrariness and subjectivity with data analysis. Take a statistical test, what justification is there for selecting a specific alpha level? Or take the answer you have there by @DimitriRizopoulos (which I +1), there are other methods to specify non linear relationships; why not those? For one, everyone is limited in their knowledge of all the available options, adding subjectivity to the process.
    – Heteroskedastic Jim
    2 days ago







  • 1




    I agree completely - I also like that you brought the alpha level as an example as I often use it in similar arguments. Of course I am not trying to find a perfect solution of a magical metric/statistical test that I can feed my data into and it will give me the answer without any doubt. In fact that was why I posted the question as I wanted to expand my knowledge on alternatives to the residual plot, trying to minimise my subjectivity in the matter. Thanks!
    – ALEX.VAMVAS
    2 days ago











  • I see, awareness of alternatives can be helpful.
    – Heteroskedastic Jim
    2 days ago












up vote
3
down vote

favorite









up vote
3
down vote

favorite











I was going through the assumptions of linear regression and of course one of them was linearity between the dependent and the independent variables - to be precise I should say that the assumption is the conditional mean of $Y_i$ given $X_i$ is linear in the parameters.



I looked in many textbooks and resources online and all of them suggested to check that assumption through a scatter plot of the residuals versus the fitted values. Although I can see that this is a valid and helpful way, I can't help but notice that it can be a bit arbitrary and subjective in some cases.



My question is if there is a statistical test to examine that assumption as well. For example when testing heteroscedasticity we can see the residual plot but we also have Levene's test.



I can see in that in How can I use the value of $R^2$ to test the linearity assumption in multiple regression analysis? ,which is very helpful, it stated the R squared is not that statistic but doesn't mention anything as a viable alternative.



Thanks in advance










share|cite|improve this question















I was going through the assumptions of linear regression and of course one of them was linearity between the dependent and the independent variables - to be precise I should say that the assumption is the conditional mean of $Y_i$ given $X_i$ is linear in the parameters.



I looked in many textbooks and resources online and all of them suggested to check that assumption through a scatter plot of the residuals versus the fitted values. Although I can see that this is a valid and helpful way, I can't help but notice that it can be a bit arbitrary and subjective in some cases.



My question is if there is a statistical test to examine that assumption as well. For example when testing heteroscedasticity we can see the residual plot but we also have Levene's test.



I can see in that in How can I use the value of $R^2$ to test the linearity assumption in multiple regression analysis? ,which is very helpful, it stated the R squared is not that statistic but doesn't mention anything as a viable alternative.



Thanks in advance







multiple-regression assumptions linearity






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited yesterday









Ben

15.5k12182




15.5k12182










asked 2 days ago









ALEX.VAMVAS

364




364







  • 1




    One cannot get away from arbitrariness and subjectivity with data analysis. Take a statistical test, what justification is there for selecting a specific alpha level? Or take the answer you have there by @DimitriRizopoulos (which I +1), there are other methods to specify non linear relationships; why not those? For one, everyone is limited in their knowledge of all the available options, adding subjectivity to the process.
    – Heteroskedastic Jim
    2 days ago







  • 1




    I agree completely - I also like that you brought the alpha level as an example as I often use it in similar arguments. Of course I am not trying to find a perfect solution of a magical metric/statistical test that I can feed my data into and it will give me the answer without any doubt. In fact that was why I posted the question as I wanted to expand my knowledge on alternatives to the residual plot, trying to minimise my subjectivity in the matter. Thanks!
    – ALEX.VAMVAS
    2 days ago











  • I see, awareness of alternatives can be helpful.
    – Heteroskedastic Jim
    2 days ago












  • 1




    One cannot get away from arbitrariness and subjectivity with data analysis. Take a statistical test, what justification is there for selecting a specific alpha level? Or take the answer you have there by @DimitriRizopoulos (which I +1), there are other methods to specify non linear relationships; why not those? For one, everyone is limited in their knowledge of all the available options, adding subjectivity to the process.
    – Heteroskedastic Jim
    2 days ago







  • 1




    I agree completely - I also like that you brought the alpha level as an example as I often use it in similar arguments. Of course I am not trying to find a perfect solution of a magical metric/statistical test that I can feed my data into and it will give me the answer without any doubt. In fact that was why I posted the question as I wanted to expand my knowledge on alternatives to the residual plot, trying to minimise my subjectivity in the matter. Thanks!
    – ALEX.VAMVAS
    2 days ago











  • I see, awareness of alternatives can be helpful.
    – Heteroskedastic Jim
    2 days ago







1




1




One cannot get away from arbitrariness and subjectivity with data analysis. Take a statistical test, what justification is there for selecting a specific alpha level? Or take the answer you have there by @DimitriRizopoulos (which I +1), there are other methods to specify non linear relationships; why not those? For one, everyone is limited in their knowledge of all the available options, adding subjectivity to the process.
– Heteroskedastic Jim
2 days ago





One cannot get away from arbitrariness and subjectivity with data analysis. Take a statistical test, what justification is there for selecting a specific alpha level? Or take the answer you have there by @DimitriRizopoulos (which I +1), there are other methods to specify non linear relationships; why not those? For one, everyone is limited in their knowledge of all the available options, adding subjectivity to the process.
– Heteroskedastic Jim
2 days ago





1




1




I agree completely - I also like that you brought the alpha level as an example as I often use it in similar arguments. Of course I am not trying to find a perfect solution of a magical metric/statistical test that I can feed my data into and it will give me the answer without any doubt. In fact that was why I posted the question as I wanted to expand my knowledge on alternatives to the residual plot, trying to minimise my subjectivity in the matter. Thanks!
– ALEX.VAMVAS
2 days ago





I agree completely - I also like that you brought the alpha level as an example as I often use it in similar arguments. Of course I am not trying to find a perfect solution of a magical metric/statistical test that I can feed my data into and it will give me the answer without any doubt. In fact that was why I posted the question as I wanted to expand my knowledge on alternatives to the residual plot, trying to minimise my subjectivity in the matter. Thanks!
– ALEX.VAMVAS
2 days ago













I see, awareness of alternatives can be helpful.
– Heteroskedastic Jim
2 days ago




I see, awareness of alternatives can be helpful.
– Heteroskedastic Jim
2 days ago










1 Answer
1






active

oldest

votes

















up vote
5
down vote













What you can do is fit a model that relaxes the linearity assumption, using, e.g., splines, and compare it with the model that assumes linearity. For example, in R, for a linear regression model you can do something like that:



library("splines")

# linear effect of age on y
fm_linear <- lm(y ~ age + sex, data = your_data)

# nonlinear effect of age on y using natural cubic splines
fm_non_linear <- lm(y ~ ns(age, 3) + sex, data = your_data)

# F-test between the two models
anova(fm_linear, fm_non_linear)





share|cite|improve this answer




















  • Hello Dimitri. Thanks for the quick response. So if I understand it correctly unlike the other assumptions of heteroscedasticity and multicollinearity, which affect the accuracy (for lack of a better word) of the OLS estimators, linearity is an assumptions that refers to the relationship between the dependent and the independent variables. We can still use OLS if it is violated but we should not have a straight line model but rather one with splines and the way to test that would be through ANOVA. Is that a correct conclusion? Also instead of ANOVA could we use the R squared?
    – ALEX.VAMVAS
    2 days ago










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f370506%2fways-of-testing-linearity-assumption-in-multiple-regression-apart-from-residual%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
5
down vote













What you can do is fit a model that relaxes the linearity assumption, using, e.g., splines, and compare it with the model that assumes linearity. For example, in R, for a linear regression model you can do something like that:



library("splines")

# linear effect of age on y
fm_linear <- lm(y ~ age + sex, data = your_data)

# nonlinear effect of age on y using natural cubic splines
fm_non_linear <- lm(y ~ ns(age, 3) + sex, data = your_data)

# F-test between the two models
anova(fm_linear, fm_non_linear)





share|cite|improve this answer




















  • Hello Dimitri. Thanks for the quick response. So if I understand it correctly unlike the other assumptions of heteroscedasticity and multicollinearity, which affect the accuracy (for lack of a better word) of the OLS estimators, linearity is an assumptions that refers to the relationship between the dependent and the independent variables. We can still use OLS if it is violated but we should not have a straight line model but rather one with splines and the way to test that would be through ANOVA. Is that a correct conclusion? Also instead of ANOVA could we use the R squared?
    – ALEX.VAMVAS
    2 days ago














up vote
5
down vote













What you can do is fit a model that relaxes the linearity assumption, using, e.g., splines, and compare it with the model that assumes linearity. For example, in R, for a linear regression model you can do something like that:



library("splines")

# linear effect of age on y
fm_linear <- lm(y ~ age + sex, data = your_data)

# nonlinear effect of age on y using natural cubic splines
fm_non_linear <- lm(y ~ ns(age, 3) + sex, data = your_data)

# F-test between the two models
anova(fm_linear, fm_non_linear)





share|cite|improve this answer




















  • Hello Dimitri. Thanks for the quick response. So if I understand it correctly unlike the other assumptions of heteroscedasticity and multicollinearity, which affect the accuracy (for lack of a better word) of the OLS estimators, linearity is an assumptions that refers to the relationship between the dependent and the independent variables. We can still use OLS if it is violated but we should not have a straight line model but rather one with splines and the way to test that would be through ANOVA. Is that a correct conclusion? Also instead of ANOVA could we use the R squared?
    – ALEX.VAMVAS
    2 days ago












up vote
5
down vote










up vote
5
down vote









What you can do is fit a model that relaxes the linearity assumption, using, e.g., splines, and compare it with the model that assumes linearity. For example, in R, for a linear regression model you can do something like that:



library("splines")

# linear effect of age on y
fm_linear <- lm(y ~ age + sex, data = your_data)

# nonlinear effect of age on y using natural cubic splines
fm_non_linear <- lm(y ~ ns(age, 3) + sex, data = your_data)

# F-test between the two models
anova(fm_linear, fm_non_linear)





share|cite|improve this answer












What you can do is fit a model that relaxes the linearity assumption, using, e.g., splines, and compare it with the model that assumes linearity. For example, in R, for a linear regression model you can do something like that:



library("splines")

# linear effect of age on y
fm_linear <- lm(y ~ age + sex, data = your_data)

# nonlinear effect of age on y using natural cubic splines
fm_non_linear <- lm(y ~ ns(age, 3) + sex, data = your_data)

# F-test between the two models
anova(fm_linear, fm_non_linear)






share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered 2 days ago









Dimitris Rizopoulos

1,803110




1,803110











  • Hello Dimitri. Thanks for the quick response. So if I understand it correctly unlike the other assumptions of heteroscedasticity and multicollinearity, which affect the accuracy (for lack of a better word) of the OLS estimators, linearity is an assumptions that refers to the relationship between the dependent and the independent variables. We can still use OLS if it is violated but we should not have a straight line model but rather one with splines and the way to test that would be through ANOVA. Is that a correct conclusion? Also instead of ANOVA could we use the R squared?
    – ALEX.VAMVAS
    2 days ago
















  • Hello Dimitri. Thanks for the quick response. So if I understand it correctly unlike the other assumptions of heteroscedasticity and multicollinearity, which affect the accuracy (for lack of a better word) of the OLS estimators, linearity is an assumptions that refers to the relationship between the dependent and the independent variables. We can still use OLS if it is violated but we should not have a straight line model but rather one with splines and the way to test that would be through ANOVA. Is that a correct conclusion? Also instead of ANOVA could we use the R squared?
    – ALEX.VAMVAS
    2 days ago















Hello Dimitri. Thanks for the quick response. So if I understand it correctly unlike the other assumptions of heteroscedasticity and multicollinearity, which affect the accuracy (for lack of a better word) of the OLS estimators, linearity is an assumptions that refers to the relationship between the dependent and the independent variables. We can still use OLS if it is violated but we should not have a straight line model but rather one with splines and the way to test that would be through ANOVA. Is that a correct conclusion? Also instead of ANOVA could we use the R squared?
– ALEX.VAMVAS
2 days ago




Hello Dimitri. Thanks for the quick response. So if I understand it correctly unlike the other assumptions of heteroscedasticity and multicollinearity, which affect the accuracy (for lack of a better word) of the OLS estimators, linearity is an assumptions that refers to the relationship between the dependent and the independent variables. We can still use OLS if it is violated but we should not have a straight line model but rather one with splines and the way to test that would be through ANOVA. Is that a correct conclusion? Also instead of ANOVA could we use the R squared?
– ALEX.VAMVAS
2 days ago

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f370506%2fways-of-testing-linearity-assumption-in-multiple-regression-apart-from-residual%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Peggy Mitchell

Palaiologos

The Forum (Inglewood, California)