Hi, i dont't think clearml agent actually ran at that point in time. All i can see in the pod is
apt install of libpthread-stubs, libx11, libxau and libxcb1 packages. pip install of clearml-agentAfter the above are successful, the pod just hang there.
Is the Glue significant in initialising clearml-agent after the pod is spawned?
Nope - once the pod is spawned the glue only monitors it externally using
kubectl - the same way you would, and will only clean it up if the task was explicitly aborted by the user.
Some breakthrough. The problem is because we switched the web, api and files server to use https (ssl) endpoint instead. I had switched back to http end points to test this theory.
Although its not printing the error, i suspect its not able to connect due to lack of the self signed cert. Previously this wasn't an issue, not sure what changed in clearml_agent=1.1.0.
There's a secondary issue resulting, i will put this on a new thread.
Nope, in the k8s glue, the config file is passed to the agent in the pod using a base64-encoded string - you can see it in the pod's command spec as one of the lines that looks something like
echo '...' | base64 --decode >> ~/clearml.conf - it's injected on startup to the
~/clearml.conf file (you can actually copy the base64-encoded string from the spec and decode it yourself if you want to see what's in there)
ok. Any idea what can go on between the setting up of clearml-agent and initialising the clearml-agent itself? Does the clearml-agent try to communicate with any internet address. From another perspective, it looks like a long time out issue. I happen to be deploying on a disconnected on-premise setup.