I Just Deployed Clearml Into K8 Cluster Using Clearml Helm Package. When I Ran A Job, It Gave This Error In The Clearml Web Server (Attached Below). I Sshed Into The Pod Running The Clearml-Agent. Upon Typing Clearml-Agent Init, I Realised The Clearml.Con

Hi DeliciousBluewhale87
My theory is that the clearml-agent is configured correctly (which means you see it in the clearml-server). The issue (I think) is that the Task itself (running inside the docker) is missing the configuration. The way the agent passes the configuration into the docker is by mapping a temporary configuration file into the docker itself. If the agent is running bare-metal, this is quite straight forward. If the agent is running on k8s (or basically inside a docker) then the agent needs:
Mapping of the docker socket Mapping of a Host folder into the agent's docker(1) Is used to actually execute docker run , while (2) is used to pass information (a.k.a configuration files) from the Agent's docker into the Task's docker.
The CLEARML_AGENT_DOCKER_HOST_MOUNT environment is the one that tells the Agents how it can pass these config files:
You can see in the example here:
We also have to mount a folder :
so that the docker will be able to mount the config files into the docker
Notice that this is not actually a PVC as there is no need for persistency, this is just a way to run a sibling docker.

Make sense?

Posted 3 years ago
0 Answers
3 years ago
2 years ago