Nest survival: Parameter estimates at the boundary

Forum for discussion of general questions related to study design and/or analysis of existing data - software neutral.

Nest survival: Parameter estimates at the boundary

Postby ursakoce » Wed Aug 30, 2017 10:18 am

Helo!

I am a relatively new MARK user. I am fitting nest survival models and have encountered the problem of parameter non-identifiability and was hoping to get some help on the issue.

My dataset consist of encounter histories of 198 Little Ringed Plover nests collected over three years. They are grouped by year and successive clutch number (first clutch, second clutch) into 6 groups: g1: year1&clutch1, g2: year1&clutch2, g3: year2&clutch1, etc. The length of the nesting season is 106 days.

In the first step I fitted a set of models, using logit link, to check whether S varies between years, between clutches or between years & clutches and whether there is an overall or within group linear or quadratic time trend in S. The best model according to AICc was the model with between year&clutch group variation and within group quadratic time trend, i.e. the most complex one. Thus, 630 (= 6 × 105) real parameters (Si) were estimated.

In the next step I constrained the model by adding different combinations of individual covariates to the best model from the previous step. However, adding these variables often resulted in problems with parameter estimation. For example, in the model with the lowest AICc (containing 5 individual covariates) only 41 ß parameters out of 48 estimable were reported as estimated in the MARK results browser (although all ß estimates were listed in the notepad file). Correcting the number of parameters in the results browser still yielded the smallest AICc for this model. Many real parameters (Si) were estimated at or near the boundary, with either extremely small or extremely large confidence intervals.

To check whether (any of) these parameters were correctly estimated at the boundary or are the estimations an artefact due to data inadequacy I continued with profile likelihood CI & data cloning approach as described in Appendix F (F.1.2), but unfortunately got stuck already in the step of profile likelihood CI calculation for the original data set. The calculation took extremely long – only about 10% CIs were estimated overnight and errors in optimization routine were reported for several real parameters. Thus, I aborted the calculation.
I am now looking for an advice where to go from here on and would like to clarify some things I am not sure about:
1) Are there other ways I could use to identify what is causing the estimation at the boundary and extremely large/small CIs?
2) How can I detect which ß parameters were not estimated (apparently 7 out of 48, according to MARK)? I read through Addendum of Chapter 4 and I see how the number of estimated ß parameters is determined based on Conditioned S vector and the threshold value but I don’t understand how to identify which are those parameters.
3) Would it be acceptable to simply discard the models containing within group time trend with an argument that there are problems with parameter estimation and presumably data inadequacy, and to proceed with simpler models instead, i.e. the ones that assume constant S within groups g1-6, even if this model structure gives higher AICc than the one containing within-group time trend?
4) Also, I am a bit confused about a part of the text in Appendix F, page F-8: “You can now see that the profile interval for parameter 14 has shortened considerably for the cloned data, with the lower bound changing from 0.732 to 0.997, indicating that this parameter was actually being estimated. In other words, parameter 14 was extrinsically non-identifiable.” According to the rest of the text in this appendix, I understand that the shrinkage of the CI indicates that the parameter was correctly estimated at 1.0. Why then is this parameter non-identifiable? I thought that extrinsic non-identifiability refers only to those parameters which were poorly estimated at the boundary and their estimates are an artefact due to data inadequacy and properties of the link function.

I’ll be grateful for any directions or suggestions on how to continue my analysis. I can provide any additional details about the data or model structure if necessary.

Thank you,
Urša
ursakoce
 
Posts: 2
Joined: Mon Aug 28, 2017 4:30 am

Return to analysis & design questions

Who is online

Users browsing this forum: No registered users and 14 guests