You can get an estimate of N from a simple, open CJS model in MARK by tricking/tweaking the .INP into RD format. You do this by adding a 'dot' after every occasion in the encounter history. So
1101 would become 1.1.0.1.
Then set up as a RD with (using the above as an example) 4 primary samples, 2 secondary per primary. Fit closed model M0 to the primaries. MARK will look at the .INP file, generate an estimate of M(t+1) for each primary (which is actually written out in the 'full output', if you look), estimate p* for each primary (which amounts to the single p estimate, given there is only one real occasion -- this p is the same p you'd get if you'd used the original .INP file, in a CJS analysis with p(t) for encounter prob), and then dump out N as a derived parameter (this happens automatically using the RD approach). This is
way simpler than deriving it by hand (since you also get the variances and covariances). Using the RD in MARK give you the two pieces you need for the canonical estimate of N as M(t+1)/p*, and all it requires is a tweak to the .INP file...
[Note added:] It also requires the constraint that
- see below}
I'm not sure what you mean by 'average population size', but if you really mean 'average' in the sense that dumb people like me would use it, you can get there from here by taking the derived estimates of N from each primary, and calculating an average. To get the 'stats right' (meaning, the correct estimate of the average and the variance of same), but avoiding messy Delta methods, you can do more or less what you might do in JAGS using the MCMC capabilities in MARK. You simply take the posterior chains for each N in the MCMC bin file, and (say, in R) create a derived parm which is the mean of these chains, and then generate various moments of the posterior (mode for 'mean', HPD for credible interval, and so on...).
This is more or less what generating a derived parameter involves in JAGS, except that JAGS calculates the derived parameter with each iteration of the chain(s), whereas doing this in MARK requires taking the chains and doing the derivation of the parameter after the fact. I submit that the 'JAGS way' is more efficient (computaitonally), but you can use MARK to get the exact same answer (I've tested this sort of thing any number of times, and have yet to find the exception -- MARK isn't nearly as flexible as JAGS, since it isn't a 'language', obviously, but if doing things with 'derived parameters' is what you're after, using the MCMC capabilities in MARK can often work quite well.)
A demonstration of the basic idea (albeit in a different context) starts on p. 41 of the 'Delta method' appendix of the 'MARK book'. The example presented there is using MCMC to generate a product of survival estimates as the derived parameter, but if you understand what its doing, it should be fairly clear how to modify it for other derived parameters.
Now, all of the preceding is avoiding any discussion over whether or not M(t+1)/p* is a meaningful estimate of N when p* is not estimated using a fully fleshed out model. This has been discussed in various places. Some of us are not big fans of basically taking M(t+1)/p*, if p* is not estimated on a model that accounts for things like capture effect, or individual heterogeneity, and the like (i.e., the majority of the advanced model types in Chapter 14). Simply taking 'count/detection' might be relatively robust in some cases (say, high p), but I'd always be suspicious. I'm not a big fan of using these sorts of uncalibrated open pop estimates, because you're making heroic (and, depending on the sampling scheme, often untestable) assumptions about p. IMO.