negative degrees of freedom and Goodness of fit testing

questions concerning analysis/theory using program MARK

negative degrees of freedom and Goodness of fit testing

Postby aswea » Mon Nov 19, 2018 4:30 pm

Hello!

I’ve come upon a new situation and I need some guidance on what it means, and how best to handle it in particular with regard to goodness-of-fit testing. Sorry this is rather a long message. I kept digging myself into more confusion.

I’ve been using spatial variants of the CJS live-recaptures model to estimate survival and detection efficiency for migratory salmon.

I was using Mark for the GOF tests for a new dataset when I noticed in the txt results file that the deviance degrees of freedom (DoF) for my most parameterized model (general model) was negative and the observed chat was set to one. As I understand, the negative DoF means that I’m estimating more parameters than I have data points and thus my model is saturated (or something beyond saturated!). So if the model is saturated, then the fit should be as good as it’s going to get; however, my general model has a -2LogL that is greater than that listed for the ‘real’ saturated model used to generate the deviance. The median chat GoF test, and the bootstrap GoF test that is based on chat, both didn’t work for this dataset given the negative DoF, but the bootstrap GoF test based on deviance returned a chat of 1.8. Am I right that my general model is saturated and can I just go ahead and use it with chat=1...or chat=1.8? The parameter estimates are reasonable.

I will also be building less parameterized models and using AIC to compare model performance. Generally in GoF testing, we estimate chat for the most general model and apply that value to all candidate models. Given that my most general model has negative DoF, is it still reasonable to take this route (i.e. proceed with chat=1.....or 1.8)? The model-ranking is relatively robust to variation in chat between 1 and 2.

This has led me to wonder how the degrees of freedom are calculated for the saturated model. I did a few tests and have come up with the number of different capture history sequences minus 1; if there were multiple covariate groups, the total for each group was summed. Is this right? If so, then for this particular dataset, I think the degrees of freedom is small because the number of recapture occasions is smallish (5) and because the detection probability is 100% for at least 2 of these occasions.

Finally, as part of this investigation, I wanted to get an idea of the size of chat for a less parameterized model which had deviance DoF above 0. Because the detection probability for several recapture occasions was 100%, the parameter count in Mark was incorrect (near the boundary). I adjusted this count in the table of results, but in the txt file of model output, the deviance DoF did not change to reflect the updated number of parameters. As I understand, the deviance DoF is the difference in the number of parameters between the saturated and general models. So for the bootstrap GoF test, if the parameter count is incorrect in Mark, do I use the deviance DoF from the txt file or do I recalculate it with the updated number of parameters?

Whew! Please let me know if I need to add extra information or clarify. Thank you very much for advice!

Aswea
aswea
 
Posts: 27
Joined: Sat Oct 17, 2009 3:32 pm
Location: Gander NL

Re: negative degrees of freedom and Goodness of fit testing

Postby gwhite » Mon Nov 19, 2018 8:21 pm

When you get negative DF for the global model, you pretty clearly have an over-parameterized model relative to the amount of data you have. Hence, GOF testing is out of the picture -- no way to detect lack of fit if you can't even estimate the parameters in the model.

Leave c-hat at 1 and start looking at models with many fewer parameters.

Gary
gwhite
 
Posts: 329
Joined: Fri May 16, 2003 9:05 am

Re: negative degrees of freedom and Goodness of fit testing

Postby aswea » Tue Nov 20, 2018 9:32 am

Thank you for replying Gary! I apologize, but I’m about to ask you for more details about every line you wrote. Obviously these types of studies are expensive and take a lot of effort, so I have to be able to explain what’s happening. (Also, there is Mark’s 9th commandment to consider about driving you insane.)

You say that the negative DoF means that I have an over-parameterized model relative to the amount of data. That makes sense, but what's confusing is that I can run a simulation with lots of data and still get negative DoF if I specify a few recaptures with perfect detection probability and maybe a few segments (intervals) with perfect survival. So it doesn’t seem to be just about the amount of data, but rather the number of different capture history sequences. We’re always aiming to get good detection probability, but it seems that we don’t want it too good! Am I right with this?

Would it be reasonable to fix the detection probability to 1 for those segments that I know didn’t miss any fish for the GoF test? This will reduce the number of parameters estimated and did not change the survival estimates or their errors.

You say that I “can’t even estimate the parameters in the model”. Can you tell me then what is wrong with the values that Mark outputs? I think that’s the key place I’m not up to speed. The real function parameters look reasonable. Also, in my data-rich simulation which I set up to have negative DoF, the real function parameters are the same as the ones I fed into the script to generate the CH sequences as you would expect with estimation working well.

Finally, you say to “leave c-hat at 1 and start looking at models with many fewer parameters”. What do you mean specifically? It’s a simple phrase, but I’m interpreting ‘start’ in 2 different ways (as in stop what I was doing and start something new...or start adding more to my current work)! I have 14 other candidate models in the set and none of them are over-parameterized. Are you saying that I need to drop off my most general model and work with these less parameterized models? If so, should I not then redo the GoF test using the most parameterized model from these remaining 14 candidates? Or are you saying that I can keep my over-parameterized model (and keep chat at 1), but need to explore alternatives? Or maybe something different...

Thank you again,

Aswea
aswea
 
Posts: 27
Joined: Sat Oct 17, 2009 3:32 pm
Location: Gander NL

Re: negative degrees of freedom and Goodness of fit testing

Postby gwhite » Tue Nov 20, 2018 5:24 pm

I've got to have some more information to be able to understand what is causing this.
How many animals are in the study?
How many occasions?
How many unique encounter histories?
Can you include the m(i, j) array so I can see the data?
gwhite
 
Posts: 329
Joined: Fri May 16, 2003 9:05 am

Re: negative degrees of freedom and Goodness of fit testing

Postby aswea » Wed Nov 21, 2018 9:01 am

Hello Gary,

I’ve pasted the info below for both my current dataset and for a simulated dataset that also has negative DoF when I build a highly-parameterized model. For my current dataset, there are very few detections on the later recapture occasions so I could probably state that the data are insufficient to allow survival to vary fully by group after occasion 2; this would bring the DoF above the saturated level. It surprises me that Mark can work so well with so little data.

Regardless of this path forward, I’m still curious about those questions: is the saturated DoF the unique number of capture history sequences minus 1 meaning that even a large dataset could have few estimable parameters before saturation? If the model is saturated, is there something wrong with the survival estimates? Does the deviance DoF and the observed chat need to be calculated manually if there are inestimable parameters?

Thank you!

*Observed data*
Covariate groups: v4gill; v4nogill; v7gill; v7nogill
Animals in study: total= 151; v4gill=38; v4nogill=38; v7gill=37; v7nogill=38
Occasions: 6 including release
Unique encounter histories: 26


Code: Select all
M array

   Group 1 V4gill
Occ.  R(i)    j= 2     3     4     5     6 Total
--- ------   ----- ----- ----- ----- ----- -----
  1     38      34     0     0     0     0    34
  2     34            32     1     0     0    33
  3     32                   9     1     2    12
  4     10                         4     0     4
  5      5                               1     1

   Group 2 V4nogill
Occ.  R(i)    j= 2     3     4     5     6 Total
--- ------   ----- ----- ----- ----- ----- -----
  1     38      36     0     0     0     0    36
  2     36            32     0     0     0    32
  3     32                  12     0     1    13
  4     12                         4     4     8
  5      4                               0     0

   Group 3 V7gill
Occ.  R(i)    j= 2     3     4     5     6 Total
--- ------   ----- ----- ----- ----- ----- -----
  1     37      27     0     0     0     0    27
  2     27            23     0     0     0    23
  3     23                   6     0     0     6
  4      6                         6     0     6
  5      6                               3     3

   Group 4 V7nogill
Occ.  R(i)    j= 2     3     4     5     6 Total
--- ------   ----- ----- ----- ----- ----- -----
  1     38      34     0     0     0     0    34
  2     34            30     0     0     0    30
  3     30                  11     0     0    11
  4     11                        10     1    11
  5     10                               8     8



*Simulated data*
Same number of groups and encounter occasions
Unique encounter histories: 30

Code: Select all
M array

   Group 1 v4gill
Occ.  R(i)    j= 2     3     4     5     6 Total
--- ------   ----- ----- ----- ----- ----- -----
  1 100000   80000     0     0     0     0 80000
  2  80000         64000     0     0     0 64000
  3  64000               33600  5040   882 39522
  4  33600                     15120  2646 17766
  5  20160                           10584 10584

   Group 2 v4nogill
Occ.  R(i)    j= 2     3     4     5     6 Total
--- ------   ----- ----- ----- ----- ----- -----
  1 100000  100000     0     0     0     0100000
  2 100000        100000     0     0     0100000
  3 100000               52500  7875  1378 61753
  4  52500                     23625  4134 27759
  5  31500                           16537 16537

   Group 3 v7gill
Occ.  R(i)    j= 2     3     4     5     6 Total
--- ------   ----- ----- ----- ----- ----- -----
  1 100000   50000     0     0     0     0 50000
  2  50000         40000     0     0     0 40000
  3  40000               28000     0     0 28000
  4  28000                     15120  1058 16178
  5  15120                            9526  9526

   Group 4 v7nogill
Occ.  R(i)    j= 2     3     4     5     6 Total
--- ------   ----- ----- ----- ----- ----- -----
  1 100000  100000     0     0     0     0100000
  2 100000        100000     0     0     0100000
  3 100000               70000     0     0 70000
  4  70000                     37800  2646 40446
  5  37800                           23814 23814
aswea
 
Posts: 27
Joined: Sat Oct 17, 2009 3:32 pm
Location: Gander NL

Re: negative degrees of freedom and Goodness of fit testing

Postby gwhite » Wed Nov 21, 2018 10:02 am

You are only seeing a small fraction of the 2^6 - 6 encounter histories because your survival rate is either zero or very close to zero. You must have set the survival rate to zero in your simulations in some cases to not see any animals a second time (or else you have set p to nearly zero).

I also wonder if you are doing the simulations correctly. The values you enter for the parameter values are beta values, not real values unless you specify the identity link function for the true model.

If survival is really low, you will need to survey the population more frequently.

So your data are sparse because of apparently poor design of your study. As D. MacKenzie says, "these procedures are statistical, not magical".

Gary
gwhite
 
Posts: 329
Joined: Fri May 16, 2003 9:05 am

Re: negative degrees of freedom and Goodness of fit testing

Postby aswea » Wed Nov 21, 2018 10:29 am

No, my survival is not zero. For the simulation, I set it as follows for v4gill (as an example): survivals were 0.8, 0.8, 0.7, 0.6, 0.7 and DE was 1,1,0.75, 0.75,0.75.

e.g. You can see from the m array that 80000 of 100000 released were detected at j2 with none detected at later encounter occasions.

I have my own script used to simulate the data so I feed in the real values.

Would it help to send you data off list to clarify?
aswea
 
Posts: 27
Joined: Sat Oct 17, 2009 3:32 pm
Location: Gander NL

Re: negative degrees of freedom and Goodness of fit testing

Postby gwhite » Wed Nov 21, 2018 4:47 pm

You're correct -- I goofed on the survival -- should have reformatted the m(i, j) text. Send me your DBF and FPT files so I can see your models and the data.

Gary
gwhite
 
Posts: 329
Joined: Fri May 16, 2003 9:05 am

Re: negative degrees of freedom and Goodness of fit testing

Postby aswea » Wed Nov 21, 2018 5:38 pm

I've emailed the files. Let me know if they didn't arrive.
aswea
 
Posts: 27
Joined: Sat Oct 17, 2009 3:32 pm
Location: Gander NL

Re: negative degrees of freedom and Goodness of fit testing

Postby egc » Wed Nov 21, 2018 7:27 pm

aswea wrote:I've emailed the files. Let me know if they didn't arrive.



For future reference, you can embed pre-formatted elements (like the m-array, with the [­-code] markup tags -- I edited your note to demonstrate. So, something messy like

M array

Group 1 V4gill
Occ. R(i) j= 2 3 4 5 6 Total
--- ------ ----- ----- ----- ----- ----- -----
1 38 34 0 0 0 0 34
2 34 32 1 0 0 33
3 32 9 1 2 12
4 10 4 0 4
5 5 1 1


becomes something correctly formatted (and easy to coy, using the 'select all' feature):

Code: Select all
M array

   Group 1 V4gill
Occ.  R(i)    j= 2     3     4     5     6 Total
--- ------   ----- ----- ----- ----- ----- -----
  1     38      34     0     0     0     0    34
  2     34            32     1     0     0    33
  3     32                   9     1     2    12
  4     10                         4     0     4
  5      5                               1     1


In fact, this is a forum FAQ: see number (2), here: viewtopic.php?f=29&t=1401
egc
Site Admin
 
Posts: 201
Joined: Thu May 15, 2003 3:25 pm

Next

Return to analysis help

Who is online

Users browsing this forum: Google [Bot] and 13 guests

cron