
Reputation
Badges 1
56 × Eureka!Hello AgitatedDove14 it does not throw an exception, but in the ui the link is broken so the image does not show
We have the same issue for hyperparameters even with only ~100 keys, where the UI likes to lazy load and remove scrolled elements so it breaks browser search, and integrated search works like 15% of the time…
Yes I think it needs pytorch, but pytorch failed to install previously ?
Ok thanks I will check first for permission issues
Does clearml-agent install the repo with pip install -e .
if it should be ? (i.e. my local repo is installed with pip install -e .
where I launch my script which calls Task.init
and .execute_remotely()
).
The task is registered and is started by the agent, the env seems to be installed well, but then it fails on /home/ubuntu/.clearml/venvs-builds/3.8/bin/python: can't open file 'fastai_classifier.py': [Errno 2] No such file or directory
Do you have an idea of what could be wrong ? The agent launch the script in the wrong working dir ? The repo is not copied ? (This script is inside a private git repo, that clearml detects correctly).
I also tried launching the script from the root of th...
WebApp: 1.2.0-153 • Server: 1.2.0-153 • API: 2.16
Ok, btw I used https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_agent_install_configure.html which was not updated so I didn't know there was a priority_packages and post_packages
However I have another problem, my git repo is installed with pip install -e .
and I import it in my script, but on a task executed by a clearml-agent the module appears not to be installed ?
Thanks ! I think .execute_remotely()
is exactly what I need
Hmm apparently if I launch the script from the root of the repo (CWD: myrepo python train/classif-custom/train.py
) it works, but from its dir it doesn't work (CWD: myrepo/train/classif-custom python train.py
)
I'm not using clearml-agent here, I use clearml.Task.init.
The exit(1) (or raised exception) is from a subprocess.
clearml==1.1.3
torch==1.9.0+cu111, torchvision==0.10, lightning not installed
python3.8
debian 10
I will try reproducing with a smaller code, it was a training with detectron2 which uses torch.,multiprocessing.spawn and torch.distributed.init_process_group
https://github.com/facebookresearch/detectron2/blob/c47167e4ac236a36895c294735a908b75f659f96/tools/train_net.py#L163
https...
Hmm it's both better and worse, it does detect pyfunctional now (in INSTALLED PACKAGES and I can see it installed in the console logs) but it fails onimport torch ModuleNotFoundError: No module named 'torch'
In the logs:
` Found PyTorch version torch==1.7.1 matching CUDA version 110
2021-04-21 15:15:11
Found PyTorch version torchvision==0.8.2 matching CUDA version 110
Collecting torch==1.7.1+cu110
File was already downloaded /home/ubuntu/.clearml/pip-download-cache/cu110/torch-1.7.1+cu110...
Yes the setup.py imports torch unfortunately https://github.com/mapillary/inplace_abn/blob/master/setup.py
not exactly, I want to launch the script (create a new experiment, not clone an existing one in the UI), how can I do it ?
I used scripts like https://github.com/allegroai/clearml-server/issues/83 previously for images but it doesn't migrate artifacts urls
The script is inside a git repo (and it's the one I launch, I would get an importerror if it was something else missing)
I think didn't understand, if I'm not at the root of the repo, I have to specify the working dir ?
We tried with a docker-compose on a GCE VM + load balancers, and then in kube, we get the same error: clearml-init
returns Error: could not verify credentials: key=241... secret=NhC...
It works with post_packages
hello, yes it’s like typos, I want to compare some experiments that were created by different versions of a script for instance, and the metrics names changed so I can’t compare it on clearml UI
and ctrl-f (of the browser) doesn’t work as lines below not loaded (even when you scroll it will remove the other lines not visible, so you can’t ctrl-f them)