Reputation
Badges 1
56 × Eureka!Yes the setup.py imports torch unfortunately https://github.com/mapillary/inplace_abn/blob/master/setup.py
Yes I think it needs pytorch, but pytorch failed to install previously ?
Ok, btw I used https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_agent_install_configure.html which was not updated so I didn't know there was a priority_packages and post_packages
Does clearml-agent install the repo with pip install -e .
if it should be ? (i.e. my local repo is installed with pip install -e .
where I launch my script which calls Task.init
and .execute_remotely()
).
WebApp: 1.2.0-153 • Server: 1.2.0-153 • API: 2.16
The task is registered and is started by the agent, the env seems to be installed well, but then it fails on /home/ubuntu/.clearml/venvs-builds/3.8/bin/python: can't open file 'fastai_classifier.py': [Errno 2] No such file or directory
Do you have an idea of what could be wrong ? The agent launch the script in the wrong working dir ? The repo is not copied ? (This script is inside a private git repo, that clearml detects correctly).
I also tried launching the script from the root of th...
not exactly, I want to launch the script (create a new experiment, not clone an existing one in the UI), how can I do it ?
Hmm apparently if I launch the script from the root of the repo (CWD: myrepo python train/classif-custom/train.py
) it works, but from its dir it doesn't work (CWD: myrepo/train/classif-custom python train.py
)
oookay so we found that for kubernetes, if we allow only tls v1.3 on the ingress controller, clearml-inits breaks with 2022-03-04 10:32:02,814 - clearml.session - WARNING - SSLError Retrying HTTPSConnectionPool(host='
http://api.clear-ml.dev.monk.ai ', port=443): Max retries exceeded with url: /auth.login (Caused by SSLError(SSLError(1, '[SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:1129)')))
or sometimes just could not verify credentials
so if anybody needs this someday (migrating your hostname which is saved inside your experiments (debug images and plots with images)) you need this https://github.com/allegroai/clearml-server/issues/83
but it's slow , you can restrict the query to the items that are actually updated, with:
` # on index events-training_debug_image-yourid
OLDHOST/ should be something like
or
NEWHOST/ same
"script": {
"source": "ctx._source.url = ctx._source.url.replace('OLDHOST/', 'NEWHO...
However I have another problem, my git repo is installed with pip install -e .
and I import it in my script, but on a task executed by a clearml-agent the module appears not to be installed ?
I think I found the problem, if the file is untracked by git, it is not saved by clearml
I think didn't understand, if I'm not at the root of the repo, I have to specify the working dir ?
Ok thanks I will check first for permission issues
hello, yes it’s like typos, I want to compare some experiments that were created by different versions of a script for instance, and the metrics names changed so I can’t compare it on clearml UI
I used scripts like https://github.com/allegroai/clearml-server/issues/83 previously for images but it doesn't migrate artifacts urls
managed a workaround thanks to the API doc, if someone encouters the same bug:tasks = [] page = 0 while True: page_tasks = Task._query_tasks(project_name=project, system_tags=[] if archived else ['-archived'], page=page, page_size=500) tasks += page_tasks page += 1 if len(page_tasks) < 500: break
With matplotlib I only get the suptitle
Thanks ! I think .execute_remotely()
is exactly what I need
We have the same issue for hyperparameters even with only ~100 keys, where the UI likes to lazy load and remove scrolled elements so it breaks browser search, and integrated search works like 15% of the time…
I welcome the day clearml saves relative urls by default ^^ it is supported by browsers (i.e. fetching /someurl is relative to the current hostname) so maybe only the clearml client would need to be updated right ? to push images with a relative url instead of the clearml server url.