f(0) not estimated, large beta SE or SE = 0 in RDFullHet

posts related to the RMark library, which may not be of general interest to users of 'classic' MARK

f(0) not estimated, large beta SE or SE = 0 in RDFullHet

Postby monicaarso » Mon Jun 22, 2020 6:40 am

Dear all,
I am fitting RDFullHet models to estimate abundace of dolphins within each summer (May-Sept) over a 10 year period. Survival is kept constant over time, fitting a suite of models with/without temporary emigration, and with/without Pledger's mixture.

I initially did set models with 5 monthly secondary occasions per primary session (year), but I then switched to 2-week secondary sampling occasions per year instead (resulting in 9 to 11 2ary sampl. occ. per year) after reading on couple of posts and in the Addendum 2 of Ch. 14 of the MARK book that one should have more than 5 occasions in order to detect and account for ind. heterogeneity in any reasonable way. Given that heterogeneity is modelled within the closed population model part of the robust design, I assumed it might be sensible to increase the number of sampling occ.

Here's the process.data, make.design.data and definition of parameters:

Code: Select all
CHPop_2=convert.inp("CH_totpop_09-19_2week.inp",use.comments=T)

#Process data specifying primary and secondary capture occasions
time.intervals.2=c(rep(0,10),1,rep(0,10),1,rep(0,11),1,rep(0,11),1,rep(0,11),1,rep(0,10),1,rep(0,9),1,rep(0,10),1,rep(0,10),1,rep(0,11),1,rep(0,10))
pop.process.2=process.data(CHPop_2,begin.time=2009,model="RDFullHet",time.intervals=time.intervals.2)

#Create the design data
pop.ddl.2=make.design.data(pop.process.2)
pop.ddl.2=add.design.data(pop.process.2,pop.ddl.2,parameter="GammaPrime",type="time",bins=c(2009,2010,2011,2012,2013,2014,2015,2016,2017,2019),right=FALSE, name="constrain",replace=TRUE)
pop.ddl.2=add.design.data(pop.process.2,pop.ddl.2,parameter="GammaDoublePrime",type="time",bins=c(2009,2010,2011,2012,2013,2014,2015,2016,2017,2019),right=FALSE, name="constrain",replace=TRUE)


And the definition of parameters:
Code: Select all

S.dot=list(formula=~1) #constant survival

p.dot=list(formula=~1,share=TRUE) # constant p
p.session=list(formula=~session,share=TRUE) # p varies by session (year)
p.time.session=list(formula=~-1+session:time,share=TRUE) # p varies by session and time
p.session.mixture=list(formula=~session+mixture,share=TRUE) # p varies by session and mixture
p.time.session.mixture=list(formula=~-1+session:time+mixture,share=TRUE) # p varies by session,time and mixture

pi.dot=list(formula=~1) #pi is constant
pi.mixture=list(formula=~-1+session) #pi varies by session
pi.null=list(formula=~1,fixed=1) # no heterogeneity

GammaDoublePrime.0=list(formula=~1,fixed=0) #no movement
GammaPrime.0=list(formula=~1,fixed=1) #no movement
GammaDoublePrime.GammaPrime.random.constant=list(formula=~1,share=TRUE) #Random and constant, no constrain needed
GammaDoublePrime.random.timeconstrain=list(formula=~constrain,share=TRUE) #Random and time dependant, with constrain
GammaDoublePrime.markov.constrain=list(formula=~constrain,share=FALSE)  #Markovian (time dependant), with constrain
GammaPrime.markov.constrain=list(formula=~constrain,share=FALSE) #Markovian (time dependant), with constrain
GammaDoublePrime.markov.constant=list(formula=~1,share=FALSE) #Markovian (constant)
GammaPrime.markov.constant=list(formula=~1,share=FALSE) #Markovian (constant)


In both 2ary occ sampling structures, models with Pledger's mixture were generally more supported than those without it, based on AIC (which was supported by the fact that the derived N estimates were smaller in those models without heterogeneity mixture parameter, i.e. likely underestimated). Some models had to be rerun with 'initial = similar model' due to lack of numerical convergence.

With both sampling structures I do get one or more f(0) estimated as zero, although it does actually happen less with the increased 2ary sampl. occ. The only models that manage to estimate all f(0) away from zero are those no movement models with heterogeneity (f(0) is still quite small, but it is estimated). This population has quite a high recapture probability (the probability of an animal being caught at least once per year comes to ~0.9 for most primary sessions), so I assumed that that might just explain it, having some of the f(0) at the boundary. However, this was not consistent for all models (except for the session 2010), and when examining the outputs, a few other things stood out:

- In some models all p beta parameters have a similar SE (e.g. SE ~ 7, 12, or 18).
- Some of the beta gamma parameters have very large SE (e.g. 327), making the real gamma parameter estimated at 0 with 95%CI 0-1 or 0-0.
- f(0) beta parameters with either SE=0 or large SE (e.g. 156), making f(0) estimated as zero. This seems to occur for more of the sessions in those models without heterogeneity (pi fixed to = 0)

More specifically,
- all models with temporary emigration and heterogeneity where p =~-1+session:time+mixture have 2010 f(0) at the zero boundary, with some of the beta gamma parameters with large SE.
- all models with temporary emigration and p=~session:time (no heterogeneity) have f(0) = 0 for years 2009,2010, 2011, 2013.

I have rerun the top selected models using the sin link for the Gamma parameters, to see if that helped in the estimation. The betas do not have huge SE anymore, but the real gamma parameters are still estimated at 0, with 95%CI going from -0.000005 to 0.000005. I could try to also use the sin link for the p parameters, which would mean changing
Code: Select all
p.time.session.mixture=list(formula=~-1+session:time+mixture,share=TRUE)
to
Code: Select all
p.time.session.mixture.sin=list(formula=~-1+session:time:mixture,link="sin",share=TRUE)
in order to make it an identity matrix, which I am not sure is correct.

Any suggestions on what might be causing these issues and how to potentially address them? The large SE could be a sign of over-parameterization but I am not getting whole sets of beta parameters with large SE, and this is still an issue with the reduced number of secondary sampling occasions. Is it just that I have many parameters at boundary?

Also, I am not sure how much to trust the model selection when the top models (within 3 AIC units) have quite a few gamma parameters not estimated.

Any suggestions on what might be happening or what to try will be welcome!
Many thanks,
Monica
monicaarso
 
Posts: 33
Joined: Wed Feb 22, 2012 2:58 pm

Re: f(0) not estimated, large beta SE or SE = 0 in RDFullHet

Postby jlaake » Thu Jun 25, 2020 4:32 pm

I have been hoping that someone who knows more about RD models would answer this question because it really isn't about RMark. I didn't identify any problems in your code. If p is really high then I would expect f0 to be small or possibly 0. The large SE are likely due to being at a boundary for logit parameters. As you found this problem went away when you used the sin link for gamma.

You didn't mention what sample size you have over the 10 years. Working on dolphins my guess is that you don't have a lot of observations/animals and this may be a problem with sparse data. Hard to tell. Have you tried simpler models?

--jeff
jlaake
 
Posts: 1479
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Re: f(0) not estimated, large beta SE or SE = 0 in RDFullHet

Postby monicaarso » Wed Jul 01, 2020 8:06 am

Hi Jeff,
Thanks for checking over this.
You didn't mention what sample size you have over the 10 years. Working on dolphins my guess is that you don't have a lot of observations/animals and this may be a problem with sparse data.

The number of marked animals per year (Mt+1) is 89, 92, 89, 102, 103, 85, 101, 94, 102, 86, and 82, which, for reference, represent, on average, about 50% of the total population. I wouldn't think this would be a small sample size given the estimated population.
Have you tried simpler models?

Well, using the RD, the no movement models with Pledger's mixture parameter manage to estimate all f(0). This is using a 2week sampling for 2ary occasions (which comes up to ~10 or 11 2ary occasions per primary session). Using the 1 month sampling regime for 2ary occasions, which obviously reduces the number of parameters doesn't manage so well, which makes me think this is not an issue of sparsness of data.

Would it be appropriate to migrate this question to the "analysis help" subforum? I know it's not great practice to post the same question in different forums, but I can make a note here to say it's moved.
Thanks again for your help,
Monica
monicaarso
 
Posts: 33
Joined: Wed Feb 22, 2012 2:58 pm


Return to RMark

Who is online

Users browsing this forum: No registered users and 1 guest

cron