GOF with sparse data

Forum for discussion of general questions related to study design and/or analysis of existing data - software neutral.

GOF with sparse data

Postby Mnat » Mon Apr 29, 2019 9:57 am

I'd like to estimate survival of bats in dependency of different weather variables and actually try to find a suitable starting model for my data. We captured bats in summer over 30 years and I used "year" as the sampling period in the analysis.
The population consists of three colonies in neighboring areas. Unfortunately, capture/investigation start differs among colonies (WT since 1989, TUP since 2002, BO since 2013).
Moreover, some individuals were marked as juveniles and some individuals were marked as adults.
To test, if there is an age effect, I binned Individuals into groups "marked as adults" and "marked as juveniles"

Colony/Marked as Juvenile or Adult --> Number of individuals

WT/MAA --> 224
WT/MAJ --> 387
TUP/MAA--> 288
TUP/MAJ--> 404
BO/MAA --> 76
BO/MAJ --> 41

If I perform GOF with RELEASE with processed data, which include all of these groups

Code: Select all
Mnat.processed=process.data(Mnat,model="CJS",begin.time=1989,groups=c("MA","col"))
release.gof(Mnat.processed)


For many tests I get this warning message :

" * * WARNING * * One or more expected values were < 2.0. "

So I conclude my data per year per group are to sparse.
However, if I throw out the colony and/or the the "marked as" group, I still have these warning messages
(it gets less but still occurs). To which degree is this acceptable and does this mean, that my starting model
should not include the groups colony and MA?
Moreover I wonder if the oveall test (Test2 + Test3 chi quare to determine c.hat) which includes the colony
group is reliable as I dont have data during the first years in two colonies.

Thank you very much
Mnat
 
Posts: 18
Joined: Wed Apr 10, 2019 4:28 am

Re: GOF with sparse data

Postby jlaake » Mon Apr 29, 2019 1:27 pm

Remember back to your basic stats class when you learned the chi-square test and a rule of thumb like the expected values should be greater than 2 or 5? The reason for that "rule" is that the chi-square approximation falls apart when expected values are too small. Think about how the test is constructed. (Observed-Expected)^2/Expected. Imagine what happens when Observed is 1 and Expected is 0.1 or 0.001 or 0.00001. You get chi-square values that are extremely large and the test statistic no longer follows the chi-square distribution. The very large chi-square values produce very large c-hat values and p-values that are way too small. The code in Release tries to do some pooling but its options are limited.

Now your release numbers are not small but they span 30 years for some cohorts and depending on your survival rate, the expected number still alive after 30 years could be essentially 0 which makes it hard to have an expected value >2 even if you increased your marking about several orders of magnitude. If p is small that exacerbates the problem.

So what can you do? Rather than use release.gof I would suggest devising your global model and using median c-hat instead which does not depend on the chi-square approximation. While technically not a gof test, it provides a way forward to handle any over-dispersion if it exists. Others may have different ideas.

The next bit is RMark stuff. Now, I'm not sure how long your critters are juveniles. If only one year, then you should assign an initial age of 0 for juveniles and 1 for adults in process.data. Then bin ages in the design data so that they are 0 and age 1+. In this way those marked as juveniles will use the same adult survival rates as those marked as adults. You may still want to include a marked.as.juvenile effect if you think there is some chronic effect of marking that affects critters for their life span but hopefully not. If your juveniles are in that stage for more than one year then things can get more complicated but not insurmountable.

Hope this helps some.
jlaake
 
Posts: 1417
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: GOF with sparse data

Postby cooch » Mon Apr 29, 2019 2:14 pm

I'll just give a quick comment on one part of Jeff's extensive and very helpful response to your original question:

jlaake wrote:So what can you do? Rather than use release.gof I would suggest devising your global model and using median c-hat instead which does not depend on the chi-square approximation. While technically not a gof test, it provides a way forward to handle any over-dispersion if it exists. Others may have different ideas.
.


The median c-hat seems to be the most robust 'general' approach, but isn' available for all data types. For CJS-type models (and closed abundance), the Fletcher c-hat (which is a real GOF test, in the sense I think Jeff refers to) is quickly becoming our 'Gold standard', since it shows good power to account for individual heterogeneity in some cases. And, it is computed automatically for supported data types (as opposed to the median c-hat, which requires lots of simulations/re-analysis cycles to work; i.e., takes a lot longer in many cases). The median c-ht is, as I recall, vailable for more data types than the Fletcher c-hat. Both the median c-hat and Fletcher c-hat approaches are discussed in Chapter 5.
cooch
 
Posts: 1628
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: GOF with sparse data

Postby Mnat » Tue Apr 30, 2019 7:34 am

Thank you for this very informative support! :)
I'm not sure if I can work with the Fletcher chat. I have two colonies where capturing unfortunately started a few years later than for the first colony, therefore I don't have capture histories for these two colonies during the first years of investiation. I'm not sure how to deal with this but in my next step I would indicate these missing first years for these two colonies as "missing occasion" with a dot (do you think this is the correct way to deal with this fact?). Following, if I understand it correctly, I can not use the Fletcher chat if I have dots in my data :?: .
Mnat
 
Posts: 18
Joined: Wed Apr 10, 2019 4:28 am

Re: GOF with sparse data

Postby jlaake » Tue Apr 30, 2019 8:43 am

You don't need to use dots for occasions prior to starting study at those colonies with cjs. Use 0. Cjs only uses data after first 1 in the capture history.
jlaake
 
Posts: 1417
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: GOF with sparse data

Postby Mnat » Tue Apr 30, 2019 9:47 am

Good news! Thank you :D
Mnat
 
Posts: 18
Joined: Wed Apr 10, 2019 4:28 am


Return to analysis & design questions

Who is online

Users browsing this forum: No registered users and 17 guests