
Reputation
Badges 1
56 × Eureka!we managed to upgrade it but the volume claim thing changed somehow, it created new disks, i will backup from the old disks and upload to the new ones to migrate but the backup procedure is not detailed for kubernetes, do you have info for this?
should i only do mongodb?
Yes I think it needs pytorch, but pytorch failed to install previously ?
Yes the setup.py imports torch unfortunately https://github.com/mapillary/inplace_abn/blob/master/setup.py
Thanks ! I think .execute_remotely()
is exactly what I need
managed a workaround thanks to the API doc, if someone encouters the same bug:tasks = [] page = 0 while True: page_tasks = Task._query_tasks(project_name=project, system_tags=[] if archived else ['-archived'], page=page, page_size=500) tasks += page_tasks page += 1 if len(page_tasks) < 500: break
Also it would be awesome if the front-end integrated a small reverse-proxy to have everything on 1 address, I don't know if this is somewhere on the roadmap ? Or what are advantages of having 3 separate addresses ?
Oh ok I thought it would be relative to the server, how do i run this migration ?
Is there a way to check how clearml gets the installed packages of the current env ?
Ok thanks I will check first for permission issues
Hello, sorry the second is for models and not images
Hmm apparently if I launch the script from the root of the repo (CWD: myrepo python train/classif-custom/train.py
) it works, but from its dir it doesn't work (CWD: myrepo/train/classif-custom python train.py
)
I suppose the images are in db.task but I can't find them
The task is registered and is started by the agent, the env seems to be installed well, but then it fails on /home/ubuntu/.clearml/venvs-builds/3.8/bin/python: can't open file 'fastai_classifier.py': [Errno 2] No such file or directory
Do you have an idea of what could be wrong ? The agent launch the script in the wrong working dir ? The repo is not copied ? (This script is inside a private git repo, that clearml detects correctly).
I also tried launching the script from the root of th...
And the comparison for the confusion matrices without the name of the experiments
It works with post_packages
Does clearml-agent install the repo with pip install -e .
if it should be ? (i.e. my local repo is installed with pip install -e .
where I launch my script which calls Task.init
and .execute_remotely()
).
And an example of the missing comparison:
the two experiments 2. plot on the first one 3. plot on the second 4. comparison plot only shows other plots (only the confusion matrices)
not exactly, I want to launch the script (create a new experiment, not clone an existing one in the UI), how can I do it ?
Hello AgitatedDove14 it does not throw an exception, but in the ui the link is broken so the image does not show
I'm not using clearml-agent here, I use clearml.Task.init.
The exit(1) (or raised exception) is from a subprocess.
clearml==1.1.3
torch==1.9.0+cu111, torchvision==0.10, lightning not installed
python3.8
debian 10
I will try reproducing with a smaller code, it was a training with detectron2 which uses torch.,multiprocessing.spawn and torch.distributed.init_process_group
https://github.com/facebookresearch/detectron2/blob/c47167e4ac236a36895c294735a908b75f659f96/tools/train_net.py#L163
https...
and ctrl-f (of the browser) doesn’t work as lines below not loaded (even when you scroll it will remove the other lines not visible, so you can’t ctrl-f them)
The script is inside a git repo (and it's the one I launch, I would get an importerror if it was something else missing)