another stupid question - what is the proper way to delete a worker? so far I've been using pgrep to find the relevant PID 😃
if you click on the experiment name here, you get 404 because link looks like this:
https://DOMAIN/projects/PROJECT_ID/EXPERIMENT_ID
when it should look like this:
https://DOMAIN/projects/PROJECT_ID/experiments/EXPERIMENT_ID
does this mean that setting initial iteration to 0 should help?
task
=
Task.get_task(task_id
=
args.task_id)
task.mark_started()
task.set_parameters_as_dict(
{
"General": {
"checkpoint_file": model.url,
"restart_optimizer": False,
}
}
)
task.set_initial_iteration(0)
task.mark_stopped()
Task.enqueue(task
=
task, queue_name
=
task.data.execution.queue)
nope, the only changes to config that we made are adding web-auth and non-responsive tasks watchdog
just in case, this warning disappeared after I https://stackoverflow.com/questions/49638699/docker-compose-restart-connection-pool-full
ValueError: Task has no hyperparams section defined
perhaps I need to do task.set_initial_iteration(0)?
I decided to restart the containers one more time, this is what I got.
I had to restart Docker service to remove the containers
fantastic, everything is working perfectly
thanks guys
I'll get back to you with the logs when the problem occurs again
parents and children. maybe tags, maybe separate tab or section, idk. I wonder if anyone else is interested in this functionality, for us this is a very common case
tags are somewhat fine for this, I guess, but there will be too many of them eventually, and they do not reflect sequential nature of the experiments
the weird part is that the old job continues running when I recreate the worker and enqueue the new job
that's right, I have 4 GPUs and 4 workers. but what if I want to run two jobs simultaneously at the same GPU
nice! exactly what I need, thank you!