Page 1 of 1

Support for multiple dispynode per hostname?

Posted: Tue Mar 29, 2022 3:13 am
by varga
The compute farm that I have to use has machines with many slots per host.
When I dispatch dispynode jobs to the farm, there is a likelihood that multiple jobs may be dispatched to the same hosts.
Is this scenario supported by dispy?
If so, how would I specify the nodes list to dispy.JobCluster?

dispy.JobCluster(nodes=['hostname', 'hostname', ...)

To gain any access to the farm, I mostly have to limit each dispynode to use 4 cpus, i.e.:

dispynode.py --cpus 4

Therefore, it is very likely that some hosts may end up with multiple dispynode jobs.

Re: Support for multiple dispynode per hostname?

Posted: Tue Mar 29, 2022 5:02 am
by Giri
I am a bit confused with terms used. I assume by slots you mean CPUs? If so, dispy's job scheduler will submit as many jobs as necessary to all CPUs on all nodes.

You can list all hosts with 'nodes', but it may be easier to programmitically control with 'NodeAllocate' class (see 'node_setup.py' example). You can, for example, use at most one CPU per node by overriding 'allocate' method to 'return 1' (number of CPUs to use on that node) etc.

If the above doesn't address your question, post more details and perhaps an example scenario to illustrate your question.

Re: Support for multiple dispynode per hostname?

Posted: Tue Mar 29, 2022 9:17 pm
by varga
Maybe I should've used cores instead of slots. Slots is the term used for the sun grid system.
Let's say that I dispatch 4 "dispynode.py -cpus 1" jobs to the compute farm.
It is very possible that 2 of the jobs may start on the same host since each host may have 32 cores (slots).
In this case, I'd need to specify nodes=['hostname1', 'hostname1', 'hostname2', 'hostname3']
My question is ... would this work?
Thanks.

Re: Support for multiple dispynode per hostname?

Posted: Tue Mar 29, 2022 11:24 pm
by Giri
If you start dispynode with "--cpus 1", then dispynoe will not run more than one job at the same time even if node has 32 cores. See "cpus" in dispynode.

When you submit jobs to cluster, dispy's job scheduler will dispatch them to all available cores on all nodes as it finds them. If a node has n cores (that dispynode can use), then up to n jobs can run on that node. I am not sure if this is the question you have. You may want to try 'sample.py' program in distribution and may be make modifications to it to understand?

Note also that by default dispy scheduler can detect nodes automatically with UDP broadcasting, so no need to list all with 'nodes' parameter. If firewall blocks UDP or network is lossy, you may need to list them.

Re: Support for multiple dispynode per hostname?

Posted: Wed Mar 30, 2022 5:32 pm
by varga
I have to dispatch dispynode jobs to our farm one at a time.
If I dispatch 2 of them and the second job starts on the same host as the first, the second dispynode dies with:

> dispynode.py --cpus 4
2022-03-30 10:23:00 dispynode - version: 4.15.1 (Python 3.9.7), PID: 24508
2022-03-30 10:23:00 dispynode - Files will be saved under "/tmp/3877835.1.h/dispy/node"
2022-03-30 10:23:00 pycos - version 4.12.1 (Python 3.9.7) with epoll I/O notifier

Another dispynode server seems to be running with PID 24498;
terminate that process and rerun with "clean" option

Since each host has 64 'cpus' it's not impossible to end up with many dispynode jobs attempting to run on the same host.
I can't just ask for all 64 'cpus' on a host for a --cpus 64 run as my job will have to wait forever to a single host to be empty.
In our env, the best option for getting available resources is to limit each job to 4 'cpus' ... which may or may not end up on the same host.

I guess it's looking like dispy won't work for my situation.

Re: Support for multiple dispynode per hostname?

Posted: Thu Mar 31, 2022 6:00 am
by Giri
I don't know if dispy works for your case, as I still don't understand the problem you describe.

Let me clarify a bit as it seems the description of the problem is not clear: Only one dispynode program runs on a node. When multiple jobs are submitted to job scheduler (with 'cluster.submit'), the scheduler will pick a suitable node that has available processors and send that job to that node. dispynode program on that node runs the job in a new process. If you are referring to 'multiple dispynodes' on a node, then I am guessing you are referring to what you see in process list which shows 'dispynode' process. This is an artifact of how multprocessing. So, if your question is 'does dispy support executing multiple jobs, one per available processor on a node', then yes, that is main feature of dispy. You can have multiple system configurations, with different number of processors available etc. and job scheduler will try to use as many processors (on all available nodes) as necessary.

If you start simplifying, by using 'sample.py' program in examples, you may be able to understand. Run dispynode with debug option (i.e., 'dispynode.py -d') on a node that you can observe (see output from dispynode), then run 'sample.py'. You may want to include 'loglevel=dispy.logger.DEBUG' in 'JobCluster' (i.e., 'cluster = dispy.JobCluster(compute, loglevel=dispy.logger.DEBUG)'. This will show log messages as jobs are scheduled, finished etc.

As mentioned before, you can use either '--cpus=16' option to dispynode to run at most 16 jobs on a node (as each job uses 4 cores for a total of 64 cores in the example you mention) or let dispynode see 64 cores, but direct job scheduler to run at most 16 jobs by overriding 'NodeAllocate.allocate' as mentioned before. But first get 'sample.py' working and then modify it a bit to run many jobs (as it is, it runs 10 jobs), add 'httpd' server (see other examples) to monitor which nodes are running what jobs etc.