Poniższe nie odpowiada dokładnie na pytanie. Może jednak dać ci kilka pomysłów. Jest to coś, co ostatnio zrobiłem, aby ocenić dopasowanie kilku modeli regresji przy użyciu od jednej do czterech zmiennych niezależnych (zmienna zależna była w pierwszej kolumnie ramki danych df1).
# create the combinations of the 4 independent variables
library(foreach)
xcomb <- foreach(i=1:4, .combine=c) %do% {combn(names(df1)[-1], i, simplify=FALSE) }
# create formulas
formlist <- lapply(xcomb, function(l) formula(paste(names(df1)[1], paste(l, collapse="+"), sep="~")))
Zawartość as.character (formlist) to
[1] "price ~ sqft" "price ~ age"
[3] "price ~ feats" "price ~ tax"
[5] "price ~ sqft + age" "price ~ sqft + feats"
[7] "price ~ sqft + tax" "price ~ age + feats"
[9] "price ~ age + tax" "price ~ feats + tax"
[11] "price ~ sqft + age + feats" "price ~ sqft + age + tax"
[13] "price ~ sqft + feats + tax" "price ~ age + feats + tax"
[15] "price ~ sqft + age + feats + tax"
Potem zebrałem kilka przydatnych wskaźników
# R squared
models.r.sq <- sapply(formlist, function(i) summary(lm(i))$r.squared)
# adjusted R squared
models.adj.r.sq <- sapply(formlist, function(i) summary(lm(i))$adj.r.squared)
# MSEp
models.MSEp <- sapply(formlist, function(i) anova(lm(i))['Mean Sq']['Residuals',])
# Full model MSE
MSE <- anova(lm(formlist[[length(formlist)]]))['Mean Sq']['Residuals',]
# Mallow's Cp
models.Cp <- sapply(formlist, function(i) {
SSEp <- anova(lm(i))['Sum Sq']['Residuals',]
mod.mat <- model.matrix(lm(i))
n <- dim(mod.mat)[1]
p <- dim(mod.mat)[2]
c(p,SSEp / MSE - (n - 2*p))
})
df.model.eval <- data.frame(model=as.character(formlist), p=models.Cp[1,],
r.sq=models.r.sq, adj.r.sq=models.adj.r.sq, MSEp=models.MSEp, Cp=models.Cp[2,])
Ostateczna ramka danych to
model p r.sq adj.r.sq MSEp Cp
1 price~sqft 2 0.71390776 0.71139818 42044.46 49.260620
2 price~age 2 0.02847477 0.01352823 162541.84 292.462049
3 price~feats 2 0.17858447 0.17137907 120716.21 351.004441
4 price~tax 2 0.76641940 0.76417343 35035.94 20.591913
5 price~sqft+age 3 0.80348960 0.79734865 33391.05 10.899307
6 price~sqft+feats 3 0.72245824 0.71754599 41148.82 46.441002
7 price~sqft+tax 3 0.79837622 0.79446120 30536.19 5.819766
8 price~age+feats 3 0.16146638 0.13526220 142483.62 245.803026
9 price~age+tax 3 0.77886989 0.77173666 37884.71 20.026075
10 price~feats+tax 3 0.76941242 0.76493500 34922.80 21.021060
11 price~sqft+age+feats 4 0.80454221 0.79523470 33739.36 12.514175
12 price~sqft+age+tax 4 0.82977846 0.82140691 29640.97 3.832692
13 price~sqft+feats+tax 4 0.80068220 0.79481991 30482.90 6.609502
14 price~age+feats+tax 4 0.79186713 0.78163109 36242.54 17.381201
15 price~sqft+age+feats+tax 5 0.83210849 0.82091573 29722.50 5.000000
Na koniec wykres Cp (przy użyciu biblioteki wle)
