Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All

Hi all 😄

The hyperparameter tuner functionality has just stopped working. When I try and launch an instance of the tuner it gets stuck at "This instance has been scheduled for launch. This may take a few moments." There are available agents in the queue and nothing is added to the queue so it can't really be a problem on our end as far as I can see. I've cloned and tested older tunings that have worked and they also have the same issue. It's been this way for a few days now.

I'm using None with the pro plan. I'm launching the instances from the web client.

LargeHamster21

  
  
Posted 10 days ago
Votes Newest

Answers 19


I'm rather confused as to what else would be running.

  
  
Posted 10 days ago

Hi SmugDolphin23

I'm a bit confused by your suggestion. To be clear, this is the logs from the HPO application instance that's spun up when you start the HPO process. I don't think we have any control over what python version or Pyro version is started in the application instance. I think this error occurs before any code on our end is run.

  
  
Posted 10 days ago

As far as I can tell there's nothing else running that isn't running on our hardware. Is there some way to see what application instances are active?

  
  
Posted 10 days ago

EnthusiasticCow4 , can you check again please?

  
  
Posted 10 days ago

Interesting, checking on my account as well O:

  
  
Posted 10 days ago

Thanks a lot for looking into this. Unfortunately the HPO is still not working. Although the instance is starting now we are receiving the error below in the console of the application instance. When manually enqueueing the same experiment it does work, leading us to believe that this is not an environment issue. It also happens for cloned runs that worked in the past. Could this be related to the previous issue?

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/serpent.py", line 506, in ser_default_class
    value = dict(vars(obj))  # make sure we can serialize anything that resembles a dict
                 ^^^^^^^^^
TypeError: vars() argument must have __dict__ attribute
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.11/site-packages/hpbandster/core/dispatcher.py", line 296, in job_runner
    worker.proxy.start_computation(self, job.id, **job.kwargs)
  File "/usr/local/lib/python3.11/site-packages/Pyro4/core.py", line 185, in __call__
    return self.__send(self.__name, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/Pyro4/core.py", line 437, in _pyroInvoke
    data, compressed = serializer.serializeCall(objectId, methodname, vargs, kwargs, compress=config.COMPRESSION)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/Pyro4/util.py", line 176, in serializeCall
    data = self.dumpsCall(obj, method, vargs, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/Pyro4/util.py", line 602, in dumpsCall
    return serpent.dumps((obj, method, vargs, kwargs), module_in_classname=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/serpent.py", line 69, in dumps
    return Serializer(indent, module_in_classname, bytes_repr).serialize(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/serpent.py", line 229, in serialize
    self._serialize(obj, out, 0)
  File "/usr/local/lib/python3.11/site-packages/serpent.py", line 255, in _serialize
    return self.dispatch[t](self, obj, out, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/serpent.py", line 319, in ser_builtins_tuple
    serialize(elt, out, level + 1)
  File "/usr/local/lib/python3.11/site-packages/serpent.py", line 255, in _serialize
    return self.dispatch[t](self, obj, out, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/serpent.py", line 392, in ser_builtins_dict
    serialize(value, out, level + 1)
  File "/usr/local/lib/python3.11/site-packages/serpent.py", line 255, in _serialize
    return self.dispatch[t](self, obj, out, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/serpent.py", line 392, in ser_builtins_dict
    serialize(value, out, level + 1)
  File "/usr/local/lib/python3.11/site-packages/serpent.py", line 274, in _serialize
    func(self, obj, out, level)
  File "/usr/local/lib/python3.11/site-packages/serpent.py", line 516, in ser_default_class
    raise TypeError("don't know how to serialize class " +
TypeError: don't know how to serialize class <class 'numpy.int64'>. Give it vars() or an appropriate __getstate__
  
  
Posted 10 days ago

Yes, but only because you asked so nicely 😚

  
  
Posted 10 days ago

Hi EnthusiasticCow4 , in the PRO plan you are limited to a certain max amount of parallel application instances. If you kill some running applications, your HPO application will start running

  
  
Posted 10 days ago

Multiple instances of the autoscaler maybe?

  
  
Posted 10 days ago

Not using it

  
  
Posted 10 days ago

I'm curious as well

  
  
Posted 10 days ago

Hi LargeHamster21 ! Looks like you are using python3.11 (agent.default_python=3.11), while Pyro4 is incompatible with this python version: None
I would suggest trying to downgrade the python version or migrate to Pyro5

  
  
Posted 10 days ago

EnthusiasticCow4 I believe you are correct. Can you try another optimization method while we look into this?

  
  
Posted 10 days ago

Thank you 😊

  
  
Posted 10 days ago

I think it was some minor misconfiguration on one of the servers running the applications, incorrect limit was set

  
  
Posted 10 days ago

From the logs it looks like the HPO application finds a worker from the queue, attempts to serialize the config sent to the worker, and crashes because of the version conflict with Pyro4. But I don't think we control any of that. I might be misunderstanding something. 🙃

  
  
Posted 10 days ago

I just checked the clearml.conf and I'm not specifying any version of python for the agents.

  
  
Posted 10 days ago

Hey John, I'm Nathans colleague, thanks for looking into this. It seems the application is starting now. What was the issue in the end?

  
  
Posted 10 days ago

Also, it does not kill the instance, it continues to send progress reports every 2 minutes. I added the full log in the attachment.

  
  
Posted 10 days ago