Chapter 7 Models Family Choice

In this chapter, we discuss the appropriate models’ family to take into account data characteristics.

Internalizing problems are computed as the sum of 10 items of the SDQ, obtaining discrete scores that range from 0 to 20. Considering data distribution (see Figure~1.9), we choose a Negative Binomial distribution to model the data.

7.1 Zero Inflated Negative Binomial

As in the case of externalizing problems, we evaluate whether a Zero-Inflated model may be appropriate. We compare the number of observed zeros and expected zeros in a Negative Binomial mixed-effects model considering the same predictors as in the case of externalizing problems. Using R formula syntax, we have

# model formula
internalizing_sum ~ gender + mother * father + (1|ID_class)

Comparing the number of observed zero and expected zeros, we get

my_check_zeroinflation(fit_int_nb)
## # Check for zero-inflation
## 
##    Observed zeros: 226
##   Predicted zeros: 195
##             Ratio: 0.86
## Model is underfitting zeros (probable zero-inflation).

Results indicate that the model is slightly under-fitting the number of zeros. Thus, we can fit a Zero Inflated Negative Binomial (ZINB) model and compare the performance of the two models. Using R formula syntax, we have

# formula for p
p ~ gender + (1|ID_class)

# formula for mu
mu ~ gender + mother * father + (1|ID_class)

Below we report results of the analysis of deviance.

anova(fit_int_nb, fit_int_zinb)
## Data: data_cluster
## Models:
## fit_int_nb: internalizing_sum ~ gender + mother * father + (1 | ID_class), zi=~0, disp=~1
## fit_int_zinb: internalizing_sum ~ gender + mother * father + (1 | ID_class), zi=~gender + (1 | ID_class), disp=~1
##              Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## fit_int_nb   19 3680.7 3770.8 -1821.3   3642.7                             
## fit_int_zinb 22 3669.0 3773.3 -1812.5   3625.0 17.677      3  0.0005127 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall, results indicate that the ZINB model performs better than the Negative Binomial model. Note, however, that BIC actually prefers the model without zero inflation. Nevertheless, in the following analyses, we decide to use ZINB models to be consistent with the analysis of the externalizing problems.