www.phidot.org

by **Eldar** » Thu Mar 24, 2011 5:01 pm

The problem I am trying to solve is: we have so many cores in computers (8 on i7 with hyper-threading) and use only one of them for Mark, it would be nice to use all of them.

With series of experiments I have formulated some solutions of using multiple cores/threads under Windows (it may work with other systems but I did not try, as I am a lamer, not a professional).

As far as writing final functions will take some time I would like to know first:
a) Is it really needed by anybody else?
b) Are the any working solutions written?

I am still working on a huge multistrata dataset using simulated annealing. As far as I have a lot of different factors that may or may not affect survival and/or transitions the amount of models to test is huge.

1. I modified Jeff's mark.wrapper function adding a possibility to extract mark output from Mark files without knowing a relationship between model.name and xxx in markxxx files (it is needed for parallel runs)..
2. I found a nice R package "batch" that can run several copies of a script in several R-processes (from command line). Each of R-processes may run Mark process through RMark
3. The next step was to add couple of locks and logical switches..
Now the whole stuff works fine except I did not finalize it in functional form.

If you are interested in the idea of parallel runs - let's discuss a way of making a function from this logic (I can see several types of final functions intended for different R-level of end user).

I think that if we will make a progress here we can switch to clusters - and it will speed the process a lot!
Eldar

by **jlaake** » Thu Mar 24, 2011 5:18 pm

Eldar-

With the proliferation of multicore machines, I think this could be potentially very useful for most users. It has been on my list of things to explore with RMark but I've just not found the time. Have you looked at any of the multicore packages that are out there? One has a function mclapply (possibly mcapply) which is a multicore version of apply that would work fairly directly in mark.wrapper. Last time I checked (12-18months ago) though the package did not work with 64 bit windows. I'd encourage you to turn it into a function that could be called like mark.wrapper and I'll include it into the package for distribution.

--jeff

by **gwhite** » Thu Mar 24, 2011 6:04 pm

Eldar:
I've been trying to get the MARK numerical code to use multiple processors with the OpenMP over the last year or so. The main way that MARK could improve execution time with multiple processors is in computing the probabilities for each encounter history. Multiple encounter histories can be processed simultaneously with multiple processors. Unfortunately, I've had no luck getting this to work so far. However, I'm currently working with the GNU gfortran processor, making another try. Maybe eventually...

Gary

by **jlaake** » Thu Mar 24, 2011 6:45 pm

Gary-

I can see that would be an ambitious task. How hard would it be to implement in MARK interface what Eldar is suggesting? A user could build say 2-8 models and save the structures and then highlight them and run them and MARK would send them off to different processors. I believe that is what Eldar is suggesting that he has done using the RMark interface to mark.exe.

--jeff

by **gwhite** » Thu Mar 24, 2011 7:39 pm

Jeff:
I think it would be easier to do this within RMark, where you send each numerical analysis to a different processor. Actually, if you have multiple processors available, then the operating system will run the various MARK jobs on different processors. This is what happens in the MARK interface when you create more runs as others are running. Of course, on the typical PC with only 2 processors, you really don't notice this effect. I've never had a quad-core system to check this with.
Gary

by **jlaake** » Thu Mar 24, 2011 8:05 pm

I had never tried it with MARK interface so I wasn't sure. RMark is handling the models sequentially and Eldar is simply putting into RMark what the MARK interface already does.

--jeff

by **Eldar** » Fri Mar 25, 2011 1:43 pm

Jeff and Garry, thanks a lot for your replies.
I never tried to run several models from Mark in parallel. Will try it when the computer will be available from RMark->Mark processes.
I am not a programmer so I can’t imaging working with Fortran, I am only trying to make easier loading of models from a list of models into Mark.

The function mclapply mentioned by Jeff still works only with windows 2xxx... In the description of the latest update, they wrote that Vista and 7 will never be supported.
" package multicore: 2011-02-11 - added (experimental) support for Windows. Note: it (sort of) works on Windows 2k and XP only. Vista and Windows 7 is not supported due to changes to the kernel. Since Vista it becomes increasingly unlikely that multicore will be possible on Windows in general."
I found a nice and easy to read description of "batch" package in latest issue of the Journal of Statistical Software (http://www.jstatsoft.org/v39/c01/paper). It contains good introduction about parallels in R.
It seems to me that "batch" is simple and ready to use. We need only to write an R-script that may be run by parallel R windows ...

I also figured out an issue that may cause a problem for apply-like parallels - when two processes start Mark in the same moment - it mixes numbers between them while reading input. I am passing this now adding temporary lock.

So for now I can see logic in following way:

R-main:
1. creates model list: cml()
2. Saves it as environment as RData object (not to generate it for each turn)... Jeff, I need some help here, do you know a simple way of selecting only objects we will need later in mark.wrapper()?
3. Saves script for child processes. We need to save a script for future runs: it can be done withe writeLines() - here I am not sure, as now I am saving this script by hand. The script contains code that may be run in parallel..
4. Saves list of models (txt) with status field
5. runs batch through system()
6. collects models in one object...

Batch - manages child R windows

R-child - inititated by Batch, each of them will run the script, that was made by R-main
1. Loads Rdata with cml.
2. Reads list of models (txt) and selects some model to run. Locks these models (writing "Running" into Status field).
2. Creates its own folder and setwd() to it.
4. Runs mark.wrapper()
5* Optional: we can collect all models from all runs after each turn - not to wait while the process will be finished.. I already wrote a function that does it..

Please, write me, if you can see any improvements of the logic.
- Eldar

,

by **cooch** » Fri Mar 25, 2011 2:25 pm

Actually, there is a relatively simply way to do this. Imagine a model set with multiple models. You build each model, and save (but don't run) the model structure. For each model, this generates a .tmp file, which is a simple ASCII file contains the control language for that model (which, ultimately) is what MARK interprets when you submit/run the model. So, I tried a little experiment:

1\ generated a candidate set of 8 approximately models for the dipper data (what else?)

2\ saved each model, then renamed each of the temp files model1.txt, model2.txt, etc. So, 8 in total.

3\ wrote a short script that took each one, and sent each model off to a separate core on an 8-core machine. If I wanted to spend more than 60 seconds on this, I could probably code up a 'smart'' script, use regex (say, within bash, if I'm using a bash schell script) to parse the list of models and count how many there were (which is why I used a common syntax for naming the models: model1.txt, model2.txt -- I can wildcard the model number), and then queue them up in sets of 'N' where the machine has 'N' cores, submitting each new job in the queue whenever a preceding one is finished.

All of this is trivial using the GNU/linux version of MARK (which I acknowledge has not been kept up to date with Gary's latest and greatest source code). And since you can submit jobs directly to mark.exe under Windows using a command line approach (mark i=input.file o=output.file), then I'm guessing there is a way to do this under whatever version of Windows you have (although I don't really know how Windows allocates cores -- Gnu/Linux does this by default: 'another job, send it to whatever core is most idle').

I'm sure there are more elegant ways to do all this (witness the apparent desire to achieve such elegance within an R construct in this thread), and I don't dispute there is some real value to figuring out how to make MARK (and related things) more 'core-aware' in a more automated fashion, but in the short run, there is a fairly straightforward option which works, if you simply want to 'get it done'.

by **pmm** » Sat Mar 26, 2011 11:26 am

The foreach package from revolution, now publicly available, can do this.

Eldar, what were you experiences with scale?

by **Eldar** » Sun Apr 03, 2011 5:37 pm

Evan, I agree that simple bash script may do the same as a bit more complicated R-script (and I it also may be faster). But for general user it would be more useful if code would be incorporated in a function with some variables, e.g. mark.wrapper.

pmm, thanks for the idea of foreach. I am not experienced in the field of parallel and huge datasets in survival. Do you know what is the difference between foreach and snowfall packages? I switched to foreach form batch now as makes code shorter and more stable (hopefully). Will post results soon,
~ Eldar

www.phidot.org

Mark in parallel under windows on multiple cores

Mark in parallel under windows on multiple cores

Re: Mark in parallel under windows on multiple cores

Re: Mark in parallel under windows on multiple cores

Re: Mark in parallel under windows on multiple cores

Re: Mark in parallel under windows on multiple cores

Re: Mark in parallel under windows on multiple cores

Re: Mark in parallel under windows on multiple cores

Re: Mark in parallel under windows on multiple cores

Re: Mark in parallel under windows on multiple cores

Re: Mark in parallel under windows on multiple cores

Who is online