Situation without the Test Data to do Model Validation
There are two common approaches for model selection when we don't have
- Avoid estimating the expected MSE by making an adjustment to the training error to account for the model complexity.
- use data-splitting techniques to create a "test set"
Notice when we do the the above action, we already have a training model which means we then calculation all MSE by this model.
Avoid Estimating the Expected MSE
Let which is the maximized value of likelihood function for
Mallow's :
- is an estimate of
- the lowest the best
- only for linear fitted model (via OLS) in regression problem
AIC:
- In the linear model with ,
- the lowest the best
BIC:
- BIC has heavier penalty as number of predictors increase so that it result more like smaller-size model
- the lowest the best
adjusted
- the greatest the best
With Estimating the Expected MSE
Validation Set: one-time data splitting; splitting the given dataset into training and validation
- highly unstable
Cross-validation: multiple-time data splitting
One of the Cross-Validation is Leave-One-Out Cross-Validation(LOOCV): each time select a validation set, and set the others as training set, then calculate MSE denote . The LOOCV estimate for the test MSE is the average of those where
- Calculation expensive, but stable
Another one is k-Fold Cross-Validation: randomly divide the data into equal-sized groups or folds, select one of them as validation set, the others as training and then calculate MSE denote as . Similarly, repeat and select different fold, and the final test MSE is
Pros and Cons
Cross-validation:
-
a direct test MSE
-
can be used in a wider range of model selection tasks
-
requires a relative large sample size
-
difficult to have guarantees for the model selected by using CV.
-
when can be consistently estimated then use without the method without estimating the Expected MSE
-
applicable to all supervised learning problems
AIC/BIC and so on approach:
- better for a limited sample size dataset
- suitable when likelihood is specified in any model