On running a gam model using the mgcv package, I encountered a strange error message which I am unable to understand:
“Error in model.frame.default(formula = death ~ pm10 + Lag(resid1, 1) + : variable lengths differ (found for ‘Lag(resid1, 1)’)”.
The number of observations used in model1 is exactly the same as the length of the deviance residual, thus I think this error is not related to difference in data size or length.
I found a fairly related error message on the web here, but that post did not receive an adequate answer, so it is not helpful to my problem.
Reproducible example and data follows:
library(quantmod) library(mgcv) require(dlnm) df <- chicagoNMMAPS df1 <- df[,c("date","dow","death","temp","pm10")] df1$trend<-seq(dim(df1)) ### Create a time trend
Run the model
model1<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5), data=df1, na.action=na.omit, family=poisson)
Obtain deviance residuals
resid1 <- residuals(model1,type="deviance")
Add a one day lagged deviance to model 1
model1_1 <- update(model1,.~.+ Lag(resid1,1), na.action=na.omit) model1_2<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5) + Lag(resid1,1), data=df1, na.action=na.omit, family=poisson)
Both of these models produced the same error message.
Joran suggested to first remove the NAs before running the model. Thus, I removed the NAs, run the model and obtained the residuals. When I updated model2 by inclusion of the lagged residuals, the error message did not appear again.
Run the main model
model2<-gam(death ~ pm10 + s(trend,k=14*7)+ s(temp,k=5), data=df2, family=poisson)
resid2 <- residuals(model2,type="deviance")
Update model2 by including the lag 1 residuals
model2_1 <- update(model2,.~.+ Lag(resid2,1), na.action=na.omit)
Another thing that can cause this error is creating a model with the centering/scaling standardize function from the arm package —
m <- standardize(lm(y ~ x, data = train))
If you then try
predict(m), you get the same error as in this question.
Its simple, just make sure the data type in your columns are the same. For e.g. I faced the same error, that and an another error:
*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
So, I went back to my excel file or csv file, set a filter on the variable throwing me an error and checked if the distinct datatypes are the same. And… Oh! it had numbers and strings, so I converted numbers to string and it worked just fine for me.
Another reason could be a variable with the same name as the column. The formula won’t know which one (variable or column) to take. Check the list of variables via
ls() (or in RStudio) and use
remove(<varname>) to remove if such a conflicting variable exists.