Reputation
Badges 1
56 × Eureka!With matplotlib I only get the suptitle
Hmm it's both better and worse, it does detect pyfunctional now (in INSTALLED PACKAGES and I can see it installed in the console logs) but it fails onimport torch ModuleNotFoundError: No module named 'torch'
In the logs:
` Found PyTorch version torch==1.7.1 matching CUDA version 110
2021-04-21 15:15:11
Found PyTorch version torchvision==0.8.2 matching CUDA version 110
Collecting torch==1.7.1+cu110
File was already downloaded /home/ubuntu/.clearml/pip-download-cache/cu110/torch-1.7.1+cu110...
Is there a way to check how clearml gets the installed packages of the current env ?
Yes I think it needs pytorch, but pytorch failed to install previously ?
Ok thanks I will check first for permission issues
ok so I reproduced it with this, it happens when I have colors (I got the error first with an exception printed with stackprinter None )
Task.init(project_name="test", task_name="test", reuse_last_task_id=False)
print("this is a test <hello world> rest of the text")
print("this is a test <hello world> rest of the text", file=sys.stderr)
print(colorama.Fore.RED + "this is a test <hello world> rest of the text" + colorama.Style.RESET_ALL)
![i...
we managed to upgrade it but the volume claim thing changed somehow, it created new disks, i will backup from the old disks and upload to the new ones to migrate but the backup procedure is not detailed for kubernetes, do you have info for this?
should i only do mongodb?
And an example of the missing comparison:
the two experiments 2. plot on the first one 3. plot on the second 4. comparison plot only shows other plots (only the confusion matrices)
I'm not using clearml-agent here, I use clearml.Task.init.
The exit(1) (or raised exception) is from a subprocess.
clearml==1.1.3
torch==1.9.0+cu111, torchvision==0.10, lightning not installed
python3.8
debian 10
I will try reproducing with a smaller code, it was a training with detectron2 which uses torch.,multiprocessing.spawn and torch.distributed.init_process_group
https://github.com/facebookresearch/detectron2/blob/c47167e4ac236a36895c294735a908b75f659f96/tools/train_net.py#L163
https...
And the comparison for the confusion matrices without the name of the experiments
Yes the setup.py imports torch unfortunately https://github.com/mapillary/inplace_abn/blob/master/setup.py
The task is registered and is started by the agent, the env seems to be installed well, but then it fails on /home/ubuntu/.clearml/venvs-builds/3.8/bin/python: can't open file 'fastai_classifier.py': [Errno 2] No such file or directory
Do you have an idea of what could be wrong ? The agent launch the script in the wrong working dir ? The repo is not copied ? (This script is inside a private git repo, that clearml detects correctly).
I also tried launching the script from the root of th...
Ok, btw I used https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_agent_install_configure.html which was not updated so I didn't know there was a priority_packages and post_packages
not exactly, I want to launch the script (create a new experiment, not clone an existing one in the UI), how can I do it ?
Thanks ! I think .execute_remotely()
is exactly what I need
The script is inside a git repo (and it's the one I launch, I would get an importerror if it was something else missing)
Hmm apparently if I launch the script from the root of the repo (CWD: myrepo python train/classif-custom/train.py
) it works, but from its dir it doesn't work (CWD: myrepo/train/classif-custom python train.py
)