Hi ExasperatedCrocodile76 ,
When running in docker mode the agent should handle all the points you raised above and just work π
So, I just just define all the requirements for the docker in clearml.conf
in default_docker
part?
What do you mean by requirements by the docker? You can set the default docker in clearml.conf
but you can always specify a different docker image on the Task level that will override this
ExasperatedCrocodile76 the agent is responsible for installing itself, clearml and any other requirement you have on the task when starting the docker container. The agent also makes sure the same settings it uses (server address, credentials etc.) are passed to the task running inside the docker, so you just need to make sure the agent is configured properly
Ok I will try this feature and let you know if I will see any problems. Thank you ! π
I'm at the point where it looks like the clearml-agent is stuck (How i execute the agent: clearml-agent daemon --queue "default" --gpus 0 --foreground --docker. After the last message: " Successfully installed:<dependencies>" nothing really happens. I do attach logs from experiment. And I also do provide the config:
This is usually the part where the agent starts to run within the container... btw, what is "pippip" ?
I can see the docker in docker ps
but it seems like it never gets to code execution. I do not have an idea where it got from. Seems like somewhere it gets "pip" + "pip".
Are you sure the server is reachable from within the docker container using the provided URL?
So probably you are right - > nc -vz localhost 8080
Output when run locally not in docker: Connection to localhost (127.0.0.1) 8080 port [tcp/http-alt] succeeded!
Output when inside docker bash: localhost [127.0.0.1] 8080 (http-alt) : Connection refused
Is the apiserver running in a docker container on the same machine?
Yes api server is on the same machine -> running in container
web_server: http://localhost:8080
api_server: http://localhost:8008
files_server: http://localhost:8081
In that case, you probably can't use localhost and you have to use the docker network to access it
I just set up my server from following url : https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac/
I know, I just meant most people don't install the server on the same machine they use for running experiments in docker mode π
In any case, if you make sure the docker containers are using the same docker network, you could refer to the apiserver as http://apiserver:8008
RE: When people do not install the server on the same machine, how is it possible for them then ? I cant reach apiserver / clearml-apiserver.
After new installation of clearml-agent and clearml I still do have the same problem.
Example: I have a simple python script and defined default_docker in clearml.config. When i clone this experiment and run it from clearml dashboard my clearml-agent running in docker mode should execute this task in docker. However, it is stucked after dependencies in the docker were successfully installed.
I tried to set up the API addresses in clearml.config based on docker ip addresses (from docker inspect) but still I am stucked there.
Do I have to do some port forwarding / add extra parameters ? Copy clearml.conf inside of the docker ? And all the stuff ? Because it does not seems to be done automatically.
Well, if the machine you're installing on has a public name, you typically simply use it
also, if you run the server on the same machine and ports are exposed outside of the docker networkm you can just reference localhost?
@<1523701181375844352:profile|ExasperatedCrocodile76> hi, try to pass β--network=hostβ to --docker_args
example:
clearml-task --project project --name name --script run.py --queue queue --requirements requirements.txt --docker python:3.7.13-bullseye --docker_args "--cpus=8 --memory=16g --network=host"