Different AIC values when using * or :

posts related to the RMark library, which may not be of general interest to users of 'classic' MARK

Different AIC values when using * or :

Postby Mardagosa » Wed Sep 08, 2021 10:39 am

I am building models that include sex, color and time. As you may imagine I am interested in the interaction between these in both Phi and p. However, I get very different AIC results when using for example Phi.time.color.sex=list(formula=~time*COLOR*SEX) and Phi.time.color.sex=list(formula=~time:COLOR:SEX). At first it seemed that * and : did the same thing but apparently not. I would really appreciate it if you could help me with this "simple question" that can have a huge effect on my results.
Mardagosa
 
Posts: 10
Joined: Wed Sep 08, 2021 10:28 am

Re: Different AIC values when using * or :

Postby jhines » Wed Sep 08, 2021 11:16 am

The * and : don't do the same thing. The : refers to a specific interaction of covariates and the * means an interaction model including all possible combinations of the 3 covariates. So the time*COLOR*SEX can be written as:

formula=~time + COLOR + SEX + time:COLOR + time:SEX + COLOR:SEX + time:COLOR:SEX

If you check the number of estimated parameters, the time*COLOR*SEX model should have more parameters than the time:COLOR:SEX model.
jhines
 
Posts: 632
Joined: Fri May 16, 2003 9:24 am
Location: Laurel, MD, USA

Re: Different AIC values when using * or :

Postby jlaake » Wed Sep 08, 2021 2:15 pm

I'm going to expand on what Jim has posted because it depends on the variable class (mainly numeric or factor here). I give some examples below and you should be able to do this with your ddl dataframes like Phi and p unless the variables are individual covariates (ie only in the data and not in the design data). This is explained in detail in the workshop notes which you should read (see link you get when you type library(RMark)). In my description below I'm assuming that sex, color and time are all variables in the design data.

Here is a simple dataframe in which sex, color and time are all factor variables.

Code: Select all
> data=data.frame(color=c("yellow","green"),sex=c("M","F"),time=c("1","2"),stringsAsFactors=TRUE)
> data
   color sex time
1 yellow   M    1
2  green   F    2
> str(data)
'data.frame':   2 obs. of  3 variables:
 $ color: Factor w/ 2 levels "green","yellow": 2 1
 $ sex  : Factor w/ 2 levels "F","M": 2 1
 $ time : Factor w/ 2 levels "1","2": 1 2
>

Now using the data, I'm creating a design matrix with the formula ~sex:color:time which are all factor variables. This is what RMark does.

Code: Select all
model.matrix(~sex:color:time,data)
  (Intercept) sexF:colorgreen:time1 sexM:colorgreen:time1 sexF:coloryellow:time1 sexM:coloryellow:time1 sexF:colorgreen:time2 sexM:colorgreen:time2 sexF:coloryellow:time2 sexM:coloryellow:time2
1           1                     0                     0                      0                      1                     0                     0                      0                      0
2           1                     0                     0                      0                      0                     1                     0                      0                      0
attr(,"assign")
[1] 0 1 1 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$sex
[1] "contr.treatment"

attr(,"contrasts")$color
[1] "contr.treatment"

attr(,"contrasts")$time
[1] "contr.treatment"


Note that there are 9 columns in the design matrix, an intercept and a column for each potential combination of the 2 levels of 3 factor variables 8=2*2*2. However there are only 8 potential combinations so this is one too many parameters if you had data in all 8 cells (not the case in my simple example). Note that RMark will drop any column which is all 0's because there are no design data for the cell. If you were to want to use this approach you should use ~-1+sex:color:time so it would drop the intercept and only have the 8 parameters - one for each cell (combination).

Now as Jim mentioned using ~sex*color*time with all factor variables, will give you the main effects and all 2 and 3 way interactions for a total of 8 parameters. The beta's will differ but if all cells have design data and the models converge, it should give the AIC as ~-1+sex:color:time but ~sex:color:time should give the same likelihood but AIC will be greater by 2 because it has one too many parameters.

Code: Select all
> model.matrix(~sex*color*time,data)
  (Intercept) sexM coloryellow time2 sexM:coloryellow sexM:time2 coloryellow:time2 sexM:coloryellow:time2
1           1    1           1     0                1          0                 0                      0
2           1    0           0     1                0          0                 0                      0
attr(,"assign")
[1] 0 1 2 3 4 5 6 7
attr(,"contrasts")
attr(,"contrasts")$sex
[1] "contr.treatment"

attr(,"contrasts")$color
[1] "contr.treatment"

attr(,"contrasts")$time
[1] "contr.treatment"



Now this all changes if one or more of the 3 parameters is numeric. Here I made time an integer with values 1 and 2.

Code: Select all
> data=data.frame(color=c("yellow","green"),sex=c("M","F"),time=1:2,stringsAsFactors=TRUE)
> str(data)
'data.frame':   2 obs. of  3 variables:
 $ color: Factor w/ 2 levels "green","yellow": 2 1
 $ sex  : Factor w/ 2 levels "F","M": 2 1
 $ time : int  1 2


Now if I use ~sex:color:time I get 5 parameters. An intercept and 4 slopes for time - one slope for each of the 4 sex-color combinations.

Code: Select all
> model.matrix(~sex:color:time,data)
  (Intercept) sexF:colorgreen:time sexM:colorgreen:time sexF:coloryellow:time sexM:coloryellow:time
1           1                    0                    0                     0                     1
2           1                    2                    0                     0                     0
attr(,"assign")
[1] 0 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$sex
[1] "contr.treatment"

attr(,"contrasts")$color
[1] "contr.treatment"


But if I use ~sex*color*time I get an intercept, each main effect for sex and color and then 5 slopes for time combined with the factor interactions. In this case with just 2 values of time, you get 8 betas for the 8 combinations but if time had 3 or more values you would still only get 8 parameters because it is fitting lines (on the link scale) for time.

Code: Select all
> model.matrix(~sex*color*time,data)
  (Intercept) sexM coloryellow time sexM:coloryellow sexM:time coloryellow:time sexM:coloryellow:time
1           1    1           1    1                1         1                1                     1
2           1    0           0    2                0         0                0                     0
attr(,"assign")
[1] 0 1 2 3 4 5 6 7
attr(,"contrasts")
attr(,"contrasts")$sex
[1] "contr.treatment"

attr(,"contrasts")$color
[1] "contr.treatment"

>


Now remember that this all assumes that there is at least one record in the design data for each combination of of the three factor variables (if they are all factor variables). For instance, if all males were yellow and females could be green or yellow and you used sex and color as group variables in process.data then you won't get the male-green-time cells and you will end up with fewer parameters (as you should) with -1+sex:color:time. An example of where this is useful is described for age and year models in the workshop notes. If you only mark young birds, in the first year you only have young birds and in subsequent years you have young and older birds.
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: Different AIC values when using * or :

Postby Mardagosa » Wed Sep 08, 2021 4:36 pm

This was extremely helpful, thank you!
So if what I want to do is to compare different types of models, I should use the following format so every model tests for a specific interaction of parameters, is that right?:

do_analysis=function()
{
Phi.dot=list(formula=~1)
Phi.time=list(formula=~-1+time)
Phi.time.color=list(formula=~-1+time:COLOR)
Phi.time.color.sex=list(formula=~-1+time:COLOR:SEX)
Phi.time.sex=list(formula=~-1+time:SEX)
Phi.color.sex=list(formula=~-1+COLOR:SEX)
Phi.color=list(formula=~-1+COLOR)
Phi.sex=list(formula=~-1+SEX)

p.dot=list(formula=~1)
p.time=list(formula=~-1+time)
p.time.color=list(formula=~-1+time:COLOR)
p.time.color.sex=list(formula=~-1+time:COLOR:SEX)
p.time.sex=list(formula=~-1+time:SEX)
p.color.sex=list(formula=~COLOR:SEX)
p.color=list(formula=~-1+COLOR)
p.sex=list(formula=~-1+SEX)

cml=create.model.list("CJS")

results=mark.wrapper(cml,data=dp,ddl=ddl,output=FALSE,silent=TRUE)
return(results)
}


What I still find a bit confusing of using * is how it creates some Beta parameters that include many of the original parameters in it, if it makes sense. Using : I obtain the logical combination of groups.
Mardagosa
 
Posts: 10
Joined: Wed Sep 08, 2021 10:28 am

Re: Different AIC values when using * or :

Postby jlaake » Wed Sep 08, 2021 7:20 pm

First of all, AIC is for model selection and not hypothesis testing. If you can get a copy I suggest reading the Model Selection book by Burnham and Anderson, either volume should get you a good intro. But maybe start with chapter 4 of Cooch and White at http://www.phidot.org/software/mark/docs/book/pdf/chap4.pdf
Even if you are using RMark, you should read Cooch and White to understand MARK and the various capture-recapture models and the concepts like model selection.

If you really want to do hypothesis testing, then you need to construct nested models and compute likelihood ratio tests which you can do with the results from the fitted models. Personally I wouldn't bother. An example of a nested model would be

~sex*color+time is nested in ~sex*color*time because it drops the sex:time and color:time interactions
~sex+color+time is also nested in ~sex*color*time as it only fits the main effects

If you want to understand how the various betas (terms) in the model works, do as I said and run model.matrix on the ddl$Phi or ddl$p dataframe with the formula you are interested in.

I'll use just 2 factors with 2 levels to illustrate one for you.

Code: Select all
~sex+color

        Intercept  Male  Yellow
FG        1           0      0
FY        1           0      1
MG        1           1      0
MY        1           1      1

There would be 3 betas - an intercept (which is Female Green), a male term and a Yellow term. Study the above so you understand which betas are added for each group.


Now a model with an interaction of sex and color has 4 terms. The three above and a single interaction because each only has 2 levels.

Code: Select all
~sex*color

        Intercept  Male  Yellow  Male:Yellow
FG        1            0       0           0
FY        1            0       1           0
MG        1            1       0           0
MY        1            1       1           1

Design matrices are not unique. You can specify the same model with different betas as

Code: Select all
~-1+ sex*color
        FG           FY         MG        MY
FG        1            0         0          0
FY        0            1         0          0
MG        0            0         1          0
MY        0            0         0          1

The first beta will be the same in each model. The second beta in the second design matrix is the same as beta 1 + beta 3 from the first design matrix.

Run the formulas with the design data in model.matrix and study the design matrices so you have a full understanding. RMark only allows you to create design matrices easily. You still need to understand them which is why I always suggest reading Cooch and White and develop a few models with the MARK interface so you understand design matrices because in that interface you have to create it by hand.
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: Different AIC values when using * or :

Postby Mardagosa » Fri Sep 10, 2021 9:05 am

Thank you for your explanation, it has made the learning process much more easier for me. I was exploring the likelihood radio tests in R, to actually test hypothesis between different models, and I just have an additional questions. LRT are done between nested models, as you explain above. But I am not sure how can you compare two models that are not nested, for example, Phi(~1)p(~-1 + SEX) and Phi(~-1 + COLOR:SEX)p(~-1 + time). In this specific case I am trying to compare the models after the AIC process, and chose the ones with <2 AIC values.
If you want the whole scenario I have the following models:
Phi(~1)p(~-1 + SEX)
Phi(~-1 + COLOR:SEX)p(~-1 + time)
Phi(~-1 + SEX)p(~-1 + time)
Phi(~-1 + COLOR)p(~-1 + SEX)
Phi(~-1 + SEX)p(~-1 + SEX).
So some of them can be nested and others don't. I thought about re doing the AIC analysis with only nested models, but that would impede me to do some of the combinations of interest, for example Phi(~1)p(~-1 + SEX) and Phi(~1)p(~-1 + time).
I appreciate your help!
Mardagosa
 
Posts: 10
Joined: Wed Sep 08, 2021 10:28 am

Re: Different AIC values when using * or :

Postby jlaake » Fri Sep 10, 2021 10:08 am

Please read the material I suggested. Models only need to be nested for likelihood ratio tests and NOT for AIC model selection but it is NOT hypothesis testing. I suggest that you stick with AIC. That said, all of the models you propose are nested within the most complex model.
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: Different AIC values when using * or :

Postby Mardagosa » Wed Sep 15, 2021 4:45 pm

Thank you I have now the material in my hands, which has been very useful. I would just have an additional questions. If I decided to do hypothesis testing using likelihood radio tests, could you compare the models:

Phi(~1)p(~-1 + SEX)
Phi(~-1 + SEX)p(~1)

None of them seems to be nested in the other.

Thanks
Mardagosa
 
Posts: 10
Joined: Wed Sep 08, 2021 10:28 am

Re: Different AIC values when using * or :

Postby jlaake » Wed Sep 15, 2021 7:41 pm

You are correct. But they are both nested within

Phi(~-1 + SEX)p(~-1 + SEX)

Like I said, stick with AIC and you won't need to consider this.
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: Different AIC values when using * or :

Postby cooch » Wed Sep 15, 2021 8:17 pm

In this instance, I agree with Jeff. But, regarding nestedness, might be worth have a look at the following: nestedness isn't always immediately 'obvious'.

http://www.phidot.org/software/mark/doc ... df#page=63
cooch
 
Posts: 1652
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University


Return to RMark

Who is online

Users browsing this forum: No registered users and 2 guests

cron