fyi,
I set the options for HyperParameterOptimizer() like,
- compute_time_limit=None,
- total_max_jobs=100,
- min_iteration_per_job=NOne,
- max_iteration_per_job=NOne,
- max_number_of_concurrent_tasks=1
plus, the first experiment terminated with early stopping.
Was "task.close()" called for the early-stopped task?
What is the experiment status in Web UI?
DangerousStarfish38 , can you provide logs please?
TroubledCamel37 No, I didn't add "task.close()" in the code. This link is what I followed.
Even after completing one experiment, the console and UI don't seem to terminate the task.
TroubledCamel37 but, I guess task.close()
would terminate the optimization task, not the single experiment. am I misunderstanding something? 😭
DangerousStarfish38 Yep, you are right, according to the docs, the optimizer.stop() should be used, not task.close(). Sorry for confusing.
I guess the issue is in connectivity/auth problems between ClearML components - there are many timeout messages in the log. I have similar messages for fileserver container, not yet resolved.
TroubledCamel37 Thanks! I'll look over the connectivity issue that you said.
Yeah, the problem was about fileserver connection like you said!
I was running the experiment in remote server, and solved the issue by opening the port for fileserver! Thanks!