I faced with a couple of errors. In dispyscheduler.py:
Code: Select all
cluster = self._clusters[compute.id]
if node.ip_addr in cluster._dispy_nodes:
continue
dispy_node = cluster._dispy_nodes.get(node.ip_addr, None)
if not dispy_node:
dispy_node = DispyNode(node.ip_addr, node.name, node.cpus)
cluster._dispy_nodes[node.ip_addr] = dispy_node
dispy_node.tx = node.tx
dispy_node.rx = node.rx
# ...
And the second case: say dispynode was killed, at the same time client was stopped and then started again, and then dispynode was started. The node now has a dead jobs for old computation than not in _clusters:
Code: Select all
def reschedule_jobs(self, dead_jobs):
if not dead_jobs:
return
for _job in dead_jobs:
cluster = self._clusters[_job.compute_id]
del self._sched_jobs[_job.uid]
Code: Select all
def reschedule_jobs(self, dead_jobs):
if not dead_jobs:
return
for _job in dead_jobs:
cluster = self._clusters.get(_job.compute_id, None)
if cluster is None:
continue
# ...