Question:
I am attempting to build Huggins models that will model capture probabilities by sex. There are 2 ways to do this in MARK: coding sex as a group covariate or as an individual covariate. The first way derives 2 estimates of N-hat (1 for each group) with associated S.E.'s and the latter derives a single estimate and S.E. I need a single estimate of N-hat and S.E to compute CV's. However, as far as I can tell, RMark does not allow using factor variables (i.e., sex) as an individual covariate, I've tried. I figure I can add the group specific N-hat estimates to obtain the single point estimate I need, but how do I handle the separate S.E.'s? I would rather not go this route, because I will have to do this for approx. 2000 resamples of my data set. Will factor variables as individual covariates ever be an option? This sex variation in p's and c's is an important part of my analysis so any help would be greatly appreciated.
Response:
Factor variables can be turned into individual covariates by creating k-1 individual covariates when there are k levels for the factor. In this case, k=2 for sex of M or F so you only need to create 1 variable. Choose one of the values as the intercept (say M in this case) and then create a dummy variable (0/1) for the other levels. In this case, create a variable named sexf which is 1 for data for females and 0 for males. Then you can use ~sexf as a formula. An example of this is given with the dipper data in RMark. This could become a big nuisance if you have a factor variable with lots of levels but you can use model.matrix to solve that. For example, lets say your data are in a dataframe x, and you have a variable named fac in x which is a factor variable. Then you could add all of the needed covariates as follows:
xx=as.data.frame(model.matrix(~fac,x))
x=cbind(x,xx[,-1,drop=FALSE])
rm(xx)
The above code creates a dataframe xx with the columns to code all of the dummy variables. But you don't want the first column which is the intercept. So the second line, removes the first column but appends all of other columns onto the dataframe. If they happened to be named fac1,fac2,fac3 (where fac0 was the intercept). Then you would use
~fac1+fac2+fac3 for the formula. You can think of these as simply added columns of the design matrix.
I'll look into possibly automating these factor variables in the formula. Currently factor variables are allowed but they must be defined in the groups argument which as noted, creates a separate N for each group for the Huggins model and that was not wanted. Note that for models with N in the likelihood (not Huggins) you would want a separate N for each group and should not do what is prescribed above.
To answer the other question, it is possible to extract the N-hats and their v-c matrix and in this case the se for the total would be the sqrt of the total of all the elements in the 2 by 2 v-c matrix.