Reputation
Badges 1
56 × Eureka!and ctrl-f (of the browser) doesn’t work as lines below not loaded (even when you scroll it will remove the other lines not visible, so you can’t ctrl-f them)
Is there a way to check how clearml gets the installed packages of the current env ?
I'm not using clearml-agent here, I use clearml.Task.init.
The exit(1) (or raised exception) is from a subprocess.
clearml==1.1.3
torch==1.9.0+cu111, torchvision==0.10, lightning not installed
python3.8
debian 10
I will try reproducing with a smaller code, it was a training with detectron2 which uses torch.,multiprocessing.spawn and torch.distributed.init_process_group
https://github.com/facebookresearch/detectron2/blob/c47167e4ac236a36895c294735a908b75f659f96/tools/train_net.py#L163
https...
so if anybody needs this someday (migrating your hostname which is saved inside your experiments (debug images and plots with images)) you need this https://github.com/allegroai/clearml-server/issues/83
but it's slow , you can restrict the query to the items that are actually updated, with:
` # on index events-training_debug_image-yourid
OLDHOST/ should be something like or
NEWHOST/ same
"script": {
"source": "ctx._source.url = ctx._source.url.replace('OLDHOST/', 'NEWHO...
WebApp: 1.2.0-153 • Server: 1.2.0-153 • API: 2.16
we still don't what was happening with the VM + docker compose + load balancers
I think I found the problem, if the file is untracked by git, it is not saved by clearml
I think didn't understand, if I'm not at the root of the repo, I have to specify the working dir ?
Does clearml-agent install the repo with pip install -e . if it should be ? (i.e. my local repo is installed with pip install -e . where I launch my script which calls Task.init and .execute_remotely() ).
The script is inside a git repo (and it's the one I launch, I would get an importerror if it was something else missing)
Hmm apparently if I launch the script from the root of the repo (CWD: myrepo python train/classif-custom/train.py ) it works, but from its dir it doesn't work (CWD: myrepo/train/classif-custom python train.py )
Thanks ! I think .execute_remotely() is exactly what I need
The task is registered and is started by the agent, the env seems to be installed well, but then it fails on /home/ubuntu/.clearml/venvs-builds/3.8/bin/python: can't open file 'fastai_classifier.py': [Errno 2] No such file or directory Do you have an idea of what could be wrong ? The agent launch the script in the wrong working dir ? The repo is not copied ? (This script is inside a private git repo, that clearml detects correctly).
I also tried launching the script from the root of th...
not exactly, I want to launch the script (create a new experiment, not clone an existing one in the UI), how can I do it ?
Also it would be awesome if the front-end integrated a small reverse-proxy to have everything on 1 address, I don't know if this is somewhere on the roadmap ? Or what are advantages of having 3 separate addresses ?
However I have another problem, my git repo is installed with pip install -e . and I import it in my script, but on a task executed by a clearml-agent the module appears not to be installed ?
I welcome the day clearml saves relative urls by default ^^ it is supported by browsers (i.e. fetching /someurl is relative to the current hostname) so maybe only the clearml client would need to be updated right ? to push images with a relative url instead of the clearml server url.
Hmm it's both better and worse, it does detect pyfunctional now (in INSTALLED PACKAGES and I can see it installed in the console logs) but it fails onimport torch ModuleNotFoundError: No module named 'torch'In the logs:
` Found PyTorch version torch==1.7.1 matching CUDA version 110
2021-04-21 15:15:11
Found PyTorch version torchvision==0.8.2 matching CUDA version 110
Collecting torch==1.7.1+cu110
File was already downloaded /home/ubuntu/.clearml/pip-download-cache/cu110/torch-1.7.1+cu110...
Oh ok I thought it would be relative to the server, how do i run this migration ?
Yes the setup.py imports torch unfortunately https://github.com/mapillary/inplace_abn/blob/master/setup.py