Another RMark question: Factors as individual covariates

posts related to the RMark library, which may not be of general interest to users of 'classic' MARK

Another RMark question: Factors as individual covariates

Postby jlaake » Wed Oct 29, 2008 5:17 pm

Question:

I am attempting to build Huggins models that will model capture probabilities by sex. There are 2 ways to do this in MARK: coding sex as a group covariate or as an individual covariate. The first way derives 2 estimates of N-hat (1 for each group) with associated S.E.'s and the latter derives a single estimate and S.E. I need a single estimate of N-hat and S.E to compute CV's. However, as far as I can tell, RMark does not allow using factor variables (i.e., sex) as an individual covariate, I've tried. I figure I can add the group specific N-hat estimates to obtain the single point estimate I need, but how do I handle the separate S.E.'s? I would rather not go this route, because I will have to do this for approx. 2000 resamples of my data set. Will factor variables as individual covariates ever be an option? This sex variation in p's and c's is an important part of my analysis so any help would be greatly appreciated.

Response:
Factor variables can be turned into individual covariates by creating k-1 individual covariates when there are k levels for the factor. In this case, k=2 for sex of M or F so you only need to create 1 variable. Choose one of the values as the intercept (say M in this case) and then create a dummy variable (0/1) for the other levels. In this case, create a variable named sexf which is 1 for data for females and 0 for males. Then you can use ~sexf as a formula. An example of this is given with the dipper data in RMark. This could become a big nuisance if you have a factor variable with lots of levels but you can use model.matrix to solve that. For example, lets say your data are in a dataframe x, and you have a variable named fac in x which is a factor variable. Then you could add all of the needed covariates as follows:

xx=as.data.frame(model.matrix(~fac,x))
x=cbind(x,xx[,-1,drop=FALSE])
rm(xx)


The above code creates a dataframe xx with the columns to code all of the dummy variables. But you don't want the first column which is the intercept. So the second line, removes the first column but appends all of other columns onto the dataframe. If they happened to be named fac1,fac2,fac3 (where fac0 was the intercept). Then you would use
~fac1+fac2+fac3 for the formula. You can think of these as simply added columns of the design matrix.

I'll look into possibly automating these factor variables in the formula. Currently factor variables are allowed but they must be defined in the groups argument which as noted, creates a separate N for each group for the Huggins model and that was not wanted. Note that for models with N in the likelihood (not Huggins) you would want a separate N for each group and should not do what is prescribed above.

To answer the other question, it is possible to extract the N-hats and their v-c matrix and in this case the se for the total would be the sqrt of the total of all the elements in the 2 by 2 v-c matrix.
jlaake
 
Posts: 1417
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Simple solution

Postby jlaufenb » Thu Oct 30, 2008 2:24 pm

First off, thanks to Jeff for posting my question to the listserv. As often happens, the solution to the problem was simple. When using the import.chdata function you must be sure to specify the field type for the dummy variable(s) as "n" for numerical. I had originally specified the type as "f" for factor when I was first learning the function. This is what was preventing RMark from recognizing it as an individual covariate.
jlaufenb
 
Posts: 49
Joined: Tue Aug 05, 2008 2:12 pm
Location: Anchorage, AK

Postby bacollier » Fri Oct 31, 2008 1:51 pm

All:

Along this vein: Working with factors as a predictor variable under an Occupancy model I hit a snag using groups= command in process.data:

variable is FWSRecReg, factor with 8 levels.

> str(swa_data$FWSRecReg)
Factor w/ 8 levels "1","2","3","4",..: 8 8 8 8 8 8 8 8 8 8 ...

when I run a constant model on my data (Psi.dot, p.dot) using this process data/make.design.data code:

swa_process<-process.data(swa_data, model="Occupancy")
swa_ddl<-make.design.data(swa_process)

everything works great and I get (expected)

> swa.results
model npar AICc DeltaAICc weight Deviance
1 p(~1)Psi(~1) 2 203.9713 0 1 0
>

but, when I include the groups= statement
swa_process<-process.data(swa_data, model="Occupancy", groups=c("FWSRecReg"))
swa_ddl<-make.design.data(swa_process)

and run the same model (Psi.dot, p.dot), I get:

> swa.results<-swa.models()
Error in derived.vcv[i, ] = value :
number of items to replace is not a multiple of replacement length


********Following model failed to run : p(~1)Psi(~1) **********

Model 1: model.Psi.dot.p.dot has not been run.
Error in result.table[order(result.table$DeltaAICc), ] :
incorrect number of dimensions
In addition: Warning message:
In min(result.table$AICc) : no non-missing arguments to min; returning Inf

I tried to debug this error but I could not tell which function call it was in?

Thanks,
Bret
bacollier
 
Posts: 230
Joined: Fri Nov 26, 2004 10:33 am
Location: Louisiana State University

Postby jlaake » Fri Oct 31, 2008 2:02 pm

What version of mark.exe are you using? If you are using the 1 October version, there was a problem that was introduced when Gary modified the code to fix a reported problem with the binary file where the derived parameters are kept. To date, I 've only found that the problem occurs when groups are used with the Occupancy and Nest Survival models, although it may also be dropping derived parameters from other models. Gary and I should have this fixed some time next week. If you need an older version of mark.exe email me off list and I'll send it to you.

--jeff
jlaake
 
Posts: 1417
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA

Postby bacollier » Fri Oct 31, 2008 2:28 pm

got it Jeff, I had my MarkPath permanently configured to the updated mark.exe Dr. White sent me offline on 1 Oct for dealing with a huge occupancy dataset I am messing with, which must be the same one you are working on for the binary file. I reverted to an older mark.exe I had tucked away and it ran fine. I hope the issue Dr. White fixed for me was not the reported problem with the binary file that is causing all the other RMark problems.

Bret
bacollier
 
Posts: 230
Joined: Fri Nov 26, 2004 10:33 am
Location: Louisiana State University

Re: Another RMark question: Factors as individual covariates

Postby krh1985 » Wed Aug 07, 2019 4:21 pm

Hi Jeff,
I wanted to revisit this factors as covariates vs. group designations. I'm working in RMark and have 2 sites, 8-10 plots per site, across 17 years. Is there a way to designate site, plot, and year as groups without creating a million dummy variables? I tried to include them as covariates and it didn't like that they're factors. My end goal is N estimates by plot by year. Any advice is appreciated. Thanks! -Katie
krh1985
 
Posts: 6
Joined: Mon Apr 15, 2019 10:55 am

Re: Another RMark question: Factors as individual covariates

Postby jlaake » Thu Aug 08, 2019 9:57 pm

You didn't tell me which model you are using but I'll assume that it will provide the estimate of N. In that case, if you specify each variable - site,plot,year in the groups argument of process.data then it will create a group for each combination that is present in the data and the model should produce an estimate of N for each group which should give you what you want. There is no reason to create dummy variables for factors unless you want to use it as an individual covariate for some reason. For estimating abundance it will be typically be easier to use factor variables in groups.

Note that there is a section in the workshop notes that discusses this issue.

--jeff
jlaake
 
Posts: 1417
Joined: Fri May 12, 2006 12:50 pm
Location: Escondido, CA


Return to RMark

Who is online

Users browsing this forum: No registered users and 11 guests