Go to https://hub.gke2.mybinder.org/user/jupyterlab-jupyterlab-demo-0570jy0h/lab/tree/demo and run the jupyter notebook called "main.ipynb". I've ran the only cell in it for several times and now its stuck.
You can see the corresponding task at https://demoapp.trains.allegro.ai/projects/98cdb5ace38946a690daec8efd668a76/experiments/db0088e1207440319d80acc3ac89aafb/execution?columns=selected&columns=type&columns=name&columns=tags&columns=status&columns=project.name&columns=users&columns=star...
It does work about 50% of the times, so running the script several times may reveal the problem.
I tried running it in a fresh virtualenv with only clearml installed, and I see the same issue.
Yes, that's what I observe on one machine. On another machine it hangs with a lower probability (1/6), but it still happens.
Hi CostlyOstrich36 , thanks for the quick reply.
I'm running that on a Nvidia DGX-A100 computer with Ubuntu 20.04.3 LTS installed, Python 3.8.10, and clearml=1.1.4.
The ClearML server is not of the latest version though, I used "docker-compose.yml" version 3.6 to launch it.
Yes, that seems to solve the issue. Thanks.
Multiprocessing Bug
Hi guys, I'm having a similar issue with clearml 1.1.4, and I have written a small script reproducing it.
There are two scripts attached, one invoking the clearml.Task
and the other one is a module containing the multiprocessing code. The issue only happens when the multiprocessing code is in another file (module).
The behavior I observe is that the local execution is being stuck with the following messages and the task in the clearml server is being stuck in a "Run...
On another machine it gets stuck once every 6 runs on average.