Possible race condition between Cluster definition and job submission?
Posted: Mon Jun 07, 2021 7:25 pm
I have a piece of code that creates a JobCluster instance with a fixed set of 'nodes' (all ip address strings at the moment) and when I attempt to submit a job (to a specific node at least) I get None back instead of a DispyJob. If I put a small sleep in between the JobCluster definition and the job=cluster.submit_node(...) call as follows, the returned job is legit.
c = JobCluster("myscript.py", depends=[...], nodes=['192.168.1.1', '192.168.1.2'])
time.sleep(1) #Removing this sleep cause the following submittal to fail with 'invalid node' message
j = c.submit_node('192.168.1.1'), 'myargs')
Looking in the code it appears it could be related to this line:
https://github.com/pgiri/dispy/blob/857 ... _.py#L2876
where a Task is created and _job.job is supposed to be the DispyJob returned...perhaps the Task needs a little time to spin up and populate the _job.job attribute with the DispyJob reference... maybe a small loop there to wait for the _job.job to become non None?
Anyhow...not sure if any of this is even reproducible but just in case anyone runs into it?
c = JobCluster("myscript.py", depends=[...], nodes=['192.168.1.1', '192.168.1.2'])
time.sleep(1) #Removing this sleep cause the following submittal to fail with 'invalid node' message
j = c.submit_node('192.168.1.1'), 'myargs')
Looking in the code it appears it could be related to this line:
https://github.com/pgiri/dispy/blob/857 ... _.py#L2876
where a Task is created and _job.job is supposed to be the DispyJob returned...perhaps the Task needs a little time to spin up and populate the _job.job attribute with the DispyJob reference... maybe a small loop there to wait for the _job.job to become non None?
Anyhow...not sure if any of this is even reproducible but just in case anyone runs into it?