Seems to happen only while the cleanup_service is running!
[2021-05-07 10:53:00,566] [9] [WARNING] [elasticsearch] POST
` [status:N/A request:60.061s]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", lin...
Thank you very much! 😃
Oh, interesting!
So pip version on per task basis makes sense ;D?
Btw: I think Task.init
is more confusing than Task.create
and I would rather rename the former.
Long story short, the Task requirements are async, so if one puts it after creating the object (at least in theory), it might be too late.
AgitatedDove14 Is there no await/synchronize method to wait for task update?
I am currently on the move, but it was something like upstream server not found in /etc/nginx/nginx.conf and if I remember correctly line 88
Perfect, thank you 🙂
Yep, I will add this as an issue. Btw: Should I rather post the kind of questions I am asking as an issue or do they fit better here?
Thanks a lot. To summarize: To me clearml is a framework, but I would rather have it be a library.
Other than that I am very happy with clearml and it is probably my favorite machine learning related package of the last two years! 🙂 And thanks for taking so much time to talk to me!
clearml will register preinstalled conda packages as requirements.
Nvm. I forgot to start my agent with --docker
. So here comes my follow up question: It seems like there is no way to define that a Task requires docker support from an agent, right?
That I understand. But I think (old) pip versions will sometimes not resolve a package. Probably not the case the other way around.
To update the task.requirements before it actually creates it (the requirements are created in a background thread)
Why can't it be updated after creation?
The one I posted on top 22.03-py3
😄
Thank you very much!
I see, so it is actually not related to clearml 🎉
I just checked and my user is part of the docker group.
I see. Thank you very much. For my current problem giving priority according to queue priority would kinda solve it. For experimentation I will sometimes enqueue a task and then later enqueue a another one of a different kind, but what happens is that even though this could be trivially solved, I will have to wait for the first one to finish. I guess this is only a problem for people with small "clusters" where SLURM does not make sense, but no scheduling at all is also suboptimal.
However, I...
As in if it was not empty it would work?
I think such an option can work, but actually if I had free wishes I would say that the clearml.Task code would need some refactoring (but I am not an experienced software engineer, so I could be totally wrong). It is not clear, what and how Task.init
does what it does and the very long method declaration is confusing. I think there should be two ways to initialize tasks:
Specify a lot manually, e.g. ` task = Task.create()
task.add_requirements(from_requirements_files(..))
task.add_entr...
Afaik, clearml-agent will use existing installed packages if they fit the requirements.txt. E.g. pytorch >= 1.7
will only install PyTorch if the environment does not already provide some version of PyTorch greater or equal to 1.7.
So it seems to be definitely a problem with docker and not with clearml. However, I do not get, why it works for you but on none of my machine (all Ubuntu 20.04 with docker 20.10)
` apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/config:/opt/clearml/config
- /opt/clearml/data/fileserver:/mnt/fileserver
depends_on:
- redis
- mongo
- elasticsearch
- fileserver
- fileserver_datasets
environment:
CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
CLEARML_...
Sure, no problem!
Thanks for the answer. So currently the cleanup is done based number of experiments that are cached? If I have a few big experiments, this could make my agents cache overflow?