Fitted Model Measurement
To measure the fit of a model, we need to compare the model's prediction with the actual data. The most common way to do this is to use the mean squared error (MSE) or the root mean squared error (RMSE). Depend on the different type of data, we can use different measurement to fit.
Quantitative
Mean squared error(MSE) with equation is the way we find our function closer to the true , that is,
- we call it's a training MSE as it uses
- the one we want is the suitable through where too small sample size will cause overfitting.
After we get such , we can use to look at test MSE, where
- If we don't have such we can use resampling technique which called cross-validation
- And the final model is the which make the smallest test , that is, the make smallest test and is what we want.
Notice, there exists some Bias-Variantrade-off among the MSE. For the given , we have the conditional expected :
- where is the variance, which describe how much would change if we estimated it using a different training data set.
- is the bias which refer to the error that is introduced by parametrizing
- is Irreducible error.
- The ideal should also minimize this MSE
From the conditional expected we have conclusion where:
- more complexity on parameters of increases the variance of whereas its bias decrease
- and, less complexity lead a small variance of but larger bias.
- When two based on different perspective have similar expected MSEs, we prefer the less complex one.
Training MSE always decrease as we increase the complexity of .
- The model based on training MSE may lead to overfitting, that is, select model based on test MSE.
We also can use another way to get by Sum of Absolute Difference (SAM): .
- Notice, both MSE && SAM are for quantitative
Categorical/Ordinal
We use expected error rate for Categorical/Ordinal, which is defined as
- Training error rate is
- Test error rate is