 
			Reputation
Badges 1
56 × Eureka!hello, yes it’s like typos, I want to compare some experiments that were created by different versions of a script for instance, and the metrics names changed so I can’t compare it on clearml UI
I'm not using clearml-agent here, I use clearml.Task.init.
The exit(1) (or raised exception) is from a subprocess.
clearml==1.1.3
torch==1.9.0+cu111, torchvision==0.10, lightning not installed
python3.8
debian 10
I will try reproducing with a smaller code, it was a training with detectron2 which uses torch.,multiprocessing.spawn and torch.distributed.init_process_group
https://github.com/facebookresearch/detectron2/blob/c47167e4ac236a36895c294735a908b75f659f96/tools/train_net.py#L163
https...
Is there a way to check how clearml gets the installed packages of the current env ?
However I have another problem, my git repo is installed with  pip install -e .  and I import it in my script, but on a task executed by a clearml-agent the module appears not to be installed ?
And an example of the missing comparison:
the two experiments 2. plot on the first one 3. plot on the second 4. comparison plot only shows other plots (only the confusion matrices)
made a PR to help a bit loading console logs  None
logs can be huge but are loaded 7kB at a time currently
100+ parameters is quite a lot indeed but very quickly achieved when using frameworks like detectron2, where you configure the model in the configuration (+dataloader, datasets, evaluators, augmentation, optimizer, lr_scheduling). anyway the search is broken as soon as one line you search is not currently visible, so already with 20+ ...
Yes the setup.py imports torch unfortunately https://github.com/mapillary/inplace_abn/blob/master/setup.py
Hmm it's both better and worse, it does detect pyfunctional now (in INSTALLED PACKAGES and I can see it installed in the console logs) but it fails onimport torch ModuleNotFoundError: No module named 'torch'In the logs:
` Found PyTorch version torch==1.7.1 matching CUDA version 110
2021-04-21 15:15:11
Found PyTorch version torchvision==0.8.2 matching CUDA version 110
Collecting torch==1.7.1+cu110
File was already downloaded /home/ubuntu/.clearml/pip-download-cache/cu110/torch-1.7.1+cu110...
Ok, btw I used https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_agent_install_configure.html which was not updated so I didn't know there was a priority_packages and post_packages
Hmm apparently if I launch the script from the root of the repo (CWD: myrepo  python train/classif-custom/train.py  ) it works, but from its dir it doesn't work (CWD: myrepo/train/classif-custom  python train.py )
quick video of the search not working
oookay so we found that for kubernetes, if we allow only tls v1.3 on the ingress controller, clearml-inits breaks with  2022-03-04 10:32:02,814 - clearml.session - WARNING - SSLError Retrying HTTPSConnectionPool(host=' http://api.clear-ml.dev.monk.ai ', port=443): Max retries exceeded with url: /auth.login (Caused by SSLError(SSLError(1, '[SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:1129)')))  or sometimes just  could not verify credentials
not exactly, I want to launch the script (create a new experiment, not clone an existing one in the UI), how can I do it ?
I suppose the images are in db.task but I can't find them
Is there a way to store relative urls in clearml ? We can't connect to our server with a public address, it only works with the internal dns from GCE
I think I found the problem, if the file is untracked by git, it is not saved by clearml
Does clearml-agent install the repo with  pip install -e .  if it should be ? (i.e. my local repo is installed with  pip install -e .  where I launch my script which calls  Task.init  and  .execute_remotely() ).
Hello AgitatedDove14 it does not throw an exception, but in the ui the link is broken so the image does not show
With matplotlib I only get the suptitle
Oh ok I thought it would be relative to the server, how do i run this migration ?
we still don't what was happening with the VM + docker compose + load balancers
and ctrl-f (of the browser) doesn’t work as lines below not loaded (even when you scroll it will remove the other lines not visible, so you can’t ctrl-f them)
Thanks ! I think  .execute_remotely()  is exactly what I need