Reputation
Badges 1
56 × Eureka!made a PR to help a bit loading console logs None
logs can be huge but are loaded 7kB at a time currently
100+ parameters is quite a lot indeed but very quickly achieved when using frameworks like detectron2, where you configure the model in the configuration (+dataloader, datasets, evaluators, augmentation, optimizer, lr_scheduling). anyway the search is broken as soon as one line you search is not currently visible, so already with 20+ ...
Hmm apparently if I launch the script from the root of the repo (CWD: myrepo python train/classif-custom/train.py
) it works, but from its dir it doesn't work (CWD: myrepo/train/classif-custom python train.py
)
not exactly, I want to launch the script (create a new experiment, not clone an existing one in the UI), how can I do it ?
It works with post_packages
The task is registered and is started by the agent, the env seems to be installed well, but then it fails on /home/ubuntu/.clearml/venvs-builds/3.8/bin/python: can't open file 'fastai_classifier.py': [Errno 2] No such file or directory
Do you have an idea of what could be wrong ? The agent launch the script in the wrong working dir ? The repo is not copied ? (This script is inside a private git repo, that clearml detects correctly).
I also tried launching the script from the root of th...
However I have another problem, my git repo is installed with pip install -e .
and I import it in my script, but on a task executed by a clearml-agent the module appears not to be installed ?
Thanks ! I think .execute_remotely()
is exactly what I need
I'm not using clearml-agent here, I use clearml.Task.init.
The exit(1) (or raised exception) is from a subprocess.
clearml==1.1.3
torch==1.9.0+cu111, torchvision==0.10, lightning not installed
python3.8
debian 10
I will try reproducing with a smaller code, it was a training with detectron2 which uses torch.,multiprocessing.spawn and torch.distributed.init_process_group
https://github.com/facebookresearch/detectron2/blob/c47167e4ac236a36895c294735a908b75f659f96/tools/train_net.py#L163
https...
Ok thanks I will check first for permission issues
I welcome the day clearml saves relative urls by default ^^ it is supported by browsers (i.e. fetching /someurl is relative to the current hostname) so maybe only the clearml client would need to be updated right ? to push images with a relative url instead of the clearml server url.
We tried with a docker-compose on a GCE VM + load balancers, and then in kube, we get the same error: clearml-init
returns Error: could not verify credentials: key=241... secret=NhC...
Also it would be awesome if the front-end integrated a small reverse-proxy to have everything on 1 address, I don't know if this is somewhere on the roadmap ? Or what are advantages of having 3 separate addresses ?
Oh ok I thought it would be relative to the server, how do i run this migration ?
WebApp: 1.2.0-153 • Server: 1.2.0-153 • API: 2.16
Ok, btw I used https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_agent_install_configure.html which was not updated so I didn't know there was a priority_packages and post_packages
Hmm it's both better and worse, it does detect pyfunctional now (in INSTALLED PACKAGES and I can see it installed in the console logs) but it fails onimport torch ModuleNotFoundError: No module named 'torch'
In the logs:
` Found PyTorch version torch==1.7.1 matching CUDA version 110
2021-04-21 15:15:11
Found PyTorch version torchvision==0.8.2 matching CUDA version 110
Collecting torch==1.7.1+cu110
File was already downloaded /home/ubuntu/.clearml/pip-download-cache/cu110/torch-1.7.1+cu110...
Is there a way to check how clearml gets the installed packages of the current env ?
Does clearml-agent install the repo with pip install -e .
if it should be ? (i.e. my local repo is installed with pip install -e .
where I launch my script which calls Task.init
and .execute_remotely()
).
Yes I think it needs pytorch, but pytorch failed to install previously ?
With matplotlib I only get the suptitle
Yes the setup.py imports torch unfortunately https://github.com/mapillary/inplace_abn/blob/master/setup.py