Yes, but only because you asked so nicely 😚
@<1545216070686609408:profile|EnthusiasticCow4> I believe you are correct. Can you try another optimization method while we look into this?
From the logs it looks like the HPO application finds a worker from the queue, attempts to serialize the config sent to the worker, and crashes because of the version conflict with Pyro4. But I don't think we control any of that. I might be misunderstanding something. 🙃
I just checked the clearml.conf and I'm not specifying any version of python for the agents.
Hi @<1523701435869433856:profile|SmugDolphin23>
I'm a bit confused by your suggestion. To be clear, this is the logs from the HPO application instance that's spun up when you start the HPO process. I don't think we have any control over what python version or Pyro version is started in the application instance. I think this error occurs before any code on our end is run.
Hi @<1780043419314294784:profile|LargeHamster21> ! Looks like you are using python3.11 (agent.default_python=3.11), while Pyro4 is incompatible with this python version: None
I would suggest trying to downgrade the python version or migrate to Pyro5
Also, it does not kill the instance, it continues to send progress reports every 2 minutes. I added the full log in the attachment.
Thanks a lot for looking into this. Unfortunately the HPO is still not working. Although the instance is starting now we are receiving the error below in the console of the application instance. When manually enqueueing the same experiment it does work, leading us to believe that this is not an environment issue. It also happens for cloned runs that worked in the past. Could this be related to the previous issue?
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/serpent.py", line 506, in ser_default_class
value = dict(vars(obj)) # make sure we can serialize anything that resembles a dict
^^^^^^^^^
TypeError: vars() argument must have __dict__ attribute
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.11/site-packages/hpbandster/core/dispatcher.py", line 296, in job_runner
worker.proxy.start_computation(self, job.id, **job.kwargs)
File "/usr/local/lib/python3.11/site-packages/Pyro4/core.py", line 185, in __call__
return self.__send(self.__name, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/Pyro4/core.py", line 437, in _pyroInvoke
data, compressed = serializer.serializeCall(objectId, methodname, vargs, kwargs, compress=config.COMPRESSION)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/Pyro4/util.py", line 176, in serializeCall
data = self.dumpsCall(obj, method, vargs, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/Pyro4/util.py", line 602, in dumpsCall
return serpent.dumps((obj, method, vargs, kwargs), module_in_classname=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/serpent.py", line 69, in dumps
return Serializer(indent, module_in_classname, bytes_repr).serialize(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/serpent.py", line 229, in serialize
self._serialize(obj, out, 0)
File "/usr/local/lib/python3.11/site-packages/serpent.py", line 255, in _serialize
return self.dispatch[t](self, obj, out, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/serpent.py", line 319, in ser_builtins_tuple
serialize(elt, out, level + 1)
File "/usr/local/lib/python3.11/site-packages/serpent.py", line 255, in _serialize
return self.dispatch[t](self, obj, out, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/serpent.py", line 392, in ser_builtins_dict
serialize(value, out, level + 1)
File "/usr/local/lib/python3.11/site-packages/serpent.py", line 255, in _serialize
return self.dispatch[t](self, obj, out, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/serpent.py", line 392, in ser_builtins_dict
serialize(value, out, level + 1)
File "/usr/local/lib/python3.11/site-packages/serpent.py", line 274, in _serialize
func(self, obj, out, level)
File "/usr/local/lib/python3.11/site-packages/serpent.py", line 516, in ser_default_class
raise TypeError("don't know how to serialize class " +
TypeError: don't know how to serialize class <class 'numpy.int64'>. Give it vars() or an appropriate __getstate__
I think it was some minor misconfiguration on one of the servers running the applications, incorrect limit was set
Hey John, I'm Nathans colleague, thanks for looking into this. It seems the application is starting now. What was the issue in the end?
@<1545216070686609408:profile|EnthusiasticCow4> , can you check again please?
Interesting, checking on my account as well O:
As far as I can tell there's nothing else running that isn't running on our hardware. Is there some way to see what application instances are active?
I'm rather confused as to what else would be running.
Hi @<1545216070686609408:profile|EnthusiasticCow4> , in the PRO plan you are limited to a certain max amount of parallel application instances. If you kill some running applications, your HPO application will start running