Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Executed A Hyperparameter Optimization Task, But Just One Single Experiment Was Performed And Nothing. The Task Wasn'T Terminated Automatically, But Still Not Starting The Next Experiments. What'S Wrong Here? How Can I Fix This?

I executed a hyperparameter optimization task, but just one single experiment was performed and nothing. The task wasn't terminated automatically, but still not starting the next experiments.
What's wrong here? How can I fix this?

  
  
Posted 4 months ago
Votes Newest

Answers 10


fyi,
I set the options for HyperParameterOptimizer() like,

  • compute_time_limit=None,
  • total_max_jobs=100,
  • min_iteration_per_job=NOne,
  • max_iteration_per_job=NOne,
  • max_number_of_concurrent_tasks=1
  
  
Posted 4 months ago

@<1664079296102141952:profile|DangerousStarfish38> Yep, you are right, according to the docs, the optimizer.stop() should be used, not task.close(). Sorry for confusing.
I guess the issue is in connectivity/auth problems between ClearML components - there are many timeout messages in the log. I have similar messages for fileserver container, not yet resolved.

  
  
Posted 4 months ago

@<1722061354531033088:profile|TroubledCamel37> No, I didn't add "task.close()" in the code. This link is what I followed.

Even after completing one experiment, the console and UI don't seem to terminate the task.

  
  
Posted 4 months ago

@<1722061354531033088:profile|TroubledCamel37> but, I guess task.close() would terminate the optimization task, not the single experiment. am I misunderstanding something? 😭

  
  
Posted 4 months ago

@<1664079296102141952:profile|DangerousStarfish38> , can you provide logs please?

  
  
Posted 4 months ago

Yeah, the problem was about fileserver connection like you said!
I was running the experiment in remote server, and solved the issue by opening the port for fileserver! Thanks!

  
  
Posted 4 months ago

plus, the first experiment terminated with early stopping.

  
  
Posted 4 months ago

Was "task.close()" called for the early-stopped task?
What is the experiment status in Web UI?

  
  
Posted 4 months ago

@<1722061354531033088:profile|TroubledCamel37> Thanks! I'll look over the connectivity issue that you said.

  
  
Posted 4 months ago

@<1523701070390366208:profile|CostlyOstrich36> here it is!

  
  
Posted 4 months ago
386 Views
10 Answers
4 months ago
4 months ago
Tags