Redundancy and identifiability (for dummies)

questions concerning analysis/theory using programs M-SURGE, E-SURGE and U-CARE

Redundancy and identifiability (for dummies)

Postby simone77 » Tue Nov 29, 2011 9:32 am

Hi,

I acknowledge so much the help that very proficient people are giving in this forum, I want to remark how useful it is.

I have some questions about parameter redundancy and identifiability that is something E-SURGE handles very well and, at the same time, something very important for these models.
Choquet wrote:In the same way, E-SURGE gives you a reliable rank (not based on the hessian)
and the list of parameters which are redundant when the model is not full rank.
This is really crucial for this kind of model.


I have read carefully the E-Surge manual as well as a very useful paper on this (Gimenez et al 2004).
Unfortunately my reduced mathematical background makes my learning curve somehow slow and let me with some doubts I would like someone could clear by answering to the following queries.

1. I am not sure to catch the difference between the identifiability and redundancy concepts. To me (i) a parameter or a function of parameters are identifiable if the model reach a global minimum for one specific value of that parameter (or function of parameters) and in the above paper is said that (ii) some parameters are redundant when the model can be expressed as a function of fewer than the original number of parameters. So for example, in the classical CJS example, the last phi and p are unidentifiable (not estimable separately) but their product (as said above, function of parameters) is identifiable. Are they also considered redundant? any further explication on these concepts?

2. As told in the above cited paper, the best method to deal with this given it provides answers to intrinsic (due to model structure) and extrinsic (due to data characteristics) redundancy and allows to know which parameters are redundant, would be the formal derivative matrix, also called CMF (Catchpole, Morgan and Freeman).
In E-SURGE, the numerical version of the CMF approach is implemented (same features but, unlike the formal derivative matrix, no estimable functions of the redundant parameters are explicitly identified).
Even though, I don't understand very well the output of parameter identifiability for my models that is explained in the Figure 32 of the manual where it is said:
E-Surge manual wrote: ...The above temporary window displays for each of the 5 points near the MLE the number of singular values of the derivative matrix, the number of additional singular values below a less selective threshold and indices of the potentially redundant mathematical parameters.

Here there are two real data examples of the (1) parameter identifiability excel sheet and the (2) reduced set of parameters excel sheet of a over-parameterized model I have run {IS(t) S(g.f.t) T(g.f) C(a(1)+a(2).f.t) SA(a(1).g.f.t(1)+a(2).g.f.t)} (2 groups, 3 states, 4 events, 9 occasions and 1 age class).

1)
Image
2)
Image
Image
Image

2.1 What is going on with parameters 38, 39, 40, 41, 60 and 61? and to which parameters do they correspond (there is a sheet of reduced parameters and another one with parameters)?
I know there is said that "5 quantities solutions of 1 partial derivative equations, made of redundant parameters (indices below) are estimables" but I don't catch what does it means (shame on me!).

2.2 Finally it is said that 13 quantities solutions of 1 partial derivative equations, made of redundant parameters (indices below) are estimables (the final sentence is referred to the MLE itself, the other prior four are points near the MLE). Again, what does it means?

2.3 Once I have found the redundant parameters I could fix them (must I fix them to one or whatever value?) and repeat the analysis, isn't it?

If you could explain me these results in some simple words, it would make things easier and my learning curve would improve a lot for sure.


Simone
simone77
 
Posts: 197
Joined: Mon Aug 10, 2009 2:52 pm

Re: Redundancy and identifiability (for dummies)

Postby CHOQUET » Wed Nov 30, 2011 5:47 am

Hello,

Estimability is not an easy problem. I will try to reply as clearly as possible.

Concerning:

1. I am not sure to catch the difference between the identifiability and redundancy concepts. To me (i) a parameter or a function of parameters are identifiable if the model reach a global minimum for one specific value of that parameter (or function of parameters) and in the above paper is said that (ii) some parameters are redundant when the model can be expressed as a function of fewer than the original number of parameters. So for example, in the classical CJS example, the last phi and p are unidentifiable (not estimable separately) but their product (as said above, function of parameters) is identifiable. Are they also considered redundant? any further explication on these concepts?

First, there is in general no global identifiability property of multistate or multievent models
(as several local minima exist in general). The CJS model is a particular case.
We consider only the local identifiability property of the parameters. In the case of redundancy (expressed by a flat likelihood in some directions), this is the of the classical CJS example, a product may be identifiable and by a easy reparameterisation, the model can be become full rank (there is still no redundancy).

2. As told in the above cited paper, the best method to deal with this given it provides answers to intrinsic (due to model structure) and extrinsic (due to data characteristics) redundancy and allows to know which parameters are redundant, would be the formal derivative matrix, also called CMF (Catchpole, Morgan and Freeman).
In E-SURGE, the numerical version of the CMF approach is implemented (same features but, unlike the formal derivative matrix, no estimable functions of the redundant parameters are explicitly identified).


In the case of a full-rank model, the proof of the justification for using numerical version of the CMF(Jacobian) approach is published in

Rouan L., R. Choquet, R. Pradel (2009). A General Framework for Modeling Memory in Capture-Recapture Data. Jabes, 14(3), 338-355.

In the case of redundancy, see the technical report

Choquet, R. and Cole, D.J. (2010) Symbolic/Numerical methods for identifiability, UKC/SMSAS/10/016

submitted to mathematical biosciences. This is this latter case in which you are interested.

Here there are two real data examples of the (1) parameter identifiability excel sheet and the (2) reduced set of parameters excel sheet of a over-parameterized model I have run {IS(t) S(g.f.t) T(g.f) C(a(1)+a(2).f.t) SA(a(1).g.f.t(1)+a(2).g.f.t)} (2 groups, 3 states, 4 events, 9 occasions and 1 age class).

2.1 What is going on with parameters 38, 39, 40, 41, 60 and 61? and to which parameters do they correspond (there is a sheet of reduced parameters and another one with parameters)?
I know there is said that "5 quantities solutions of 1 partial derivative equations, made of redundant parameters (indices below) are estimables" but I don't catch what does it means (shame on me!).


E-SURGE asked you that mathematical parameters 38,39,40,41,60,61 are not identifiable but rather some functions (5 quantities) of these 6 parameters. (For the CJS, 1 quantity function of 2 parameters)

To know what is parameters 38, 39, 40, 41, 60,61. The best is to go to IVFV and to see where is parameter 38. The first nine parameters are the initial state probability. Then parameters 10 to .. are survival followed by the transitions and last the capture rate followed by the probability of mis-assignement (last).

Fixed parameter are removed from the list so skip it when you sum.

2.2 Finally it is said that 13 quantities solutions of 1 partial derivative equations, made of redundant parameters (indices below) are estimables (the final sentence is referred to the MLE itself, the other prior four are points near the MLE). Again, what does it means?

The last point at which the numerical method is applied is the MLE for which some parameters are at the boundaries. In that case, the numerical CMF approach failed. So consider only the four first lits which are very reliable as being identical.

2.3 Once I have found the redundant parameters I could fix them (must I fix them to one or whatever value?) and repeat the analysis, isn't it?

In the case of the CJS model, you can fix to one either the last capture or the last survival rate because it will not constraint the estimate of the product. However, if you don't make this, there is still no problem because the other parameters are estimables independently.
In your case, as you don't know which quantities are estimables, I recommend to not fix any parameters. You know however which estimates of parameters are reliable, the ones which are not in the 4 first lists.

Sincerely,
Rémi
CHOQUET
 
Posts: 211
Joined: Thu Nov 24, 2005 4:58 am
Location: CEFE, Montpellier, FRANCE.

Re: Redundancy and identifiability (for dummies)

Postby simone77 » Wed Nov 30, 2011 8:29 am

Thank you Rémi,

It was just what I was looking for. After asking lot of questions these last days I feel ready to take the plunge, stop trying and go on with the definitive analyses.
I have just a last (hope so) question about this:

Given I am not interested in the estimate of the Initial State parameter, I have set an umbrella model by keeping that constant and let the other biological parameters to vary with (i) time and/or (ii) gender and/or (iii) departure state (in my case arrival state doesn't add variability as I am dealing with just two states and one hidden - dead - state).
Does this affect the estimates of the other biological parameters I am interested in?

I guess it might be a very trivial question but not for me.
simone77
 
Posts: 197
Joined: Mon Aug 10, 2009 2:52 pm

Re: Redundancy and identifiability (for dummies)

Postby CHOQUET » Wed Nov 30, 2011 11:33 am

I have no opinion on that subject.
A lot of studies have done this hypothesis that the proportion in each state should not change with time.
This correspond perhaps to a period where there is no perturbation of the population.

Nevertheless, you can test this on your best model.
CHOQUET
 
Posts: 211
Joined: Thu Nov 24, 2005 4:58 am
Location: CEFE, Montpellier, FRANCE.

Re: Redundancy and identifiability (for dummies)

Postby ganghis » Wed Nov 30, 2011 12:17 pm

HI Simone,
If disease is endemic in this population (i.e., disease prevalence more or less constant over time), this sounds like a reasonable assumption. If it's something more like an epidemic, you'll definitely want to consider models with time varying pi.

Paul
ganghis
 
Posts: 84
Joined: Tue Aug 10, 2004 2:05 pm

Re: Redundancy and identifiability (for dummies)

Postby cooch » Wed Nov 30, 2011 1:19 pm

ganghis wrote:HI Simone,
If disease is endemic in this population (i.e., disease prevalence more or less constant over time), this sounds like a reasonable assumption. If it's something more like an epidemic, you'll definitely want to consider models with time varying pi.

Paul


Technically speaking, an endemic disease is simply one for which there is a non-zero equilibrium. An epidemic is one characterized by prevalence -> 0 before all susceptible individuals are infected. The endemic equilibrium is not always 'constant' (although it can be). It is easy enough to demonstrate an oscillatory endemic equilibrium, both mathematically* (basic analysis of simple SIRS model), and by biological example (measles).

So, I'd submit that there isn't necessarily a strong a priori expectation for a 'dot model' for endemic diseases.
cooch
 
Posts: 1628
Joined: Thu May 15, 2003 4:11 pm
Location: Cornell University

Re: Redundancy and identifiability (for dummies)

Postby simone77 » Thu Dec 01, 2011 8:53 am

Thank you all for your answers, very useful as always.

My model selection process confirms that IS should be modeled by considering time variation as these models perform quite better (in terms of AIC) than those that don't take into account this.
My question:
Simone77 wrote:Given I am not interested in the estimate of the Initial State parameter, I have set an umbrella model by keeping that constant and let the other biological parameters to vary with (i) time and/or (ii) gender and/or (iii) departure state (in my case arrival state doesn't add variability as I am dealing with just two states and one hidden - dead - state).
Does this affect the estimates of the other biological parameters I am interested in?

was quite unfortunate given it is obvious (I say it a posteriori!) that the structure of IS affect the estimates of the other biological parameters estimates as I could appreciate by looking at the estimates of each model (with and without time variation in IS).
simone77
 
Posts: 197
Joined: Mon Aug 10, 2009 2:52 pm


Return to analysis help

Who is online

Users browsing this forum: No registered users and 8 guests

cron