Reputation
Badges 1
282 × Eureka!They don't have the same version. I do seem to notice that if the client is using version 3.8, during remote execution will try to use that same version despite the docker image not installed with that version.
Hi, i dont't think clearml agent actually ran at that point in time. All i can see in the pod is
apt install of libpthread-stubs, libx11, libxau and libxcb1 packages. pip install of clearml-agentAfter the above are successful, the pod just hang there.
So i kept trying, but i'm stuck on this when i run python k8s_glue_example.py
TypeError: init () got an unexpected keyword argument 'base_pod_num'
Reply…
does the bash script need clearml-agent to be able to communicate to the https clearml-server first? If yes, there's a chicken/egg problem here.
I have since ruled out the apt and pypi repos. Both of them are installing properly on the pods.
I did notice that in the tmp folder, .clearml_agent.xxxxx.cfg does not exists.
So the context I'm asking is I realise I'll need to catalogue all the dataset ids created by ppl separately on a spreadsheet. And for each experiment, I'll need to go into the code commit to see which id is being used. But on the other hand, I thought I've seen advertised use cases where the experiment can be directly linked to the dataset id being used. The brain's a bit rusty to recall how it was done.
I'm having the same problem. You using latest clearmagent? Is your docker image a root user by default?
I managed to find out why. The docker image I'm using is not set as root user thus the error. But I'm wondering why this is the case as docker best practices does indicate we should use a non root on production images.
Hi, so this means if i want to use Kubernetes, i would have to 'manually' install multiple agents on all the worker nodes?
first line to make sure kubectl is connected to k8s.
Hi AgitatedDove14 , i've got the same error. It would appear that the code references clearml_agent/helper/base.py
which i believe is part of clearml-agent v0.17.1. Could that be the issue?
Hi CostlyOstrich36 , That's correct.
Hi HelpfulDeer76 , I'm facing similar issues. Would you mind describing in detail how you deploy clearml-agent? Is it running as a pod on k8s?
Its. 0.17-63.
It doesn't appear in profile page.
Hi. The upgrade seems to go well but i'm seeing one wierd output. When i ran a task and observe the software installed
under the execution
tab , i still see clearml=0.17
. Is this expected?
After some churning, this is the answer. Change it in the clearml-agent init
generated clearml.conf.
` default_docker: {
# default docker image to use when running in docker mode
image: "nvidia/cuda:10.1-runtime-ubuntu18.04"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]
arguments: ["--env GIT_SSL_NO_VERIFY=true",]
} `
Hi, this is what i got. No mention of the env variables.
` Current configuration (clearml_agent v0.17.2, location: /home/jax/clearml.conf):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
ap...
unfortunately, our security posture is so strict that we cannot have an agent git user that have unfettered read access to all repos.
clearml-serving does not support Spacy models out of the box among many others and that Clearml-Serving only supports following;
Support Machine Learning Models (Scikit Learn, XGBoost, LightGBM)
Support Deep Learning Models (Tensorflow, PyTorch, ONNX).
An easy way to extend support to different models would be a boon.
I believe in such scenarios, a custom engine would be required. I would like to know, how difficult is it to create a custom engine with clearml-serving? For example, in this...
Hi SuccessfulKoala55 ,i managed to install clearml-agent==1.0.1rc5. However, the same issues occur.
yeah, someone should call them out.
Hi SuccessfulKoala55 , can i confirm the following comments in the docker-compose.yml ?
And after that to run docker-compose commands without loss of data?
docker-compose down docker-compose up
docker-compose.yml
`
version: "3.6"
services:
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/config:/opt/clearml/config
#...
From ClearML perspective, how would we enable this, considering we don't have direct control or even IP of the agents
Hi, i was reading this thread and wondered which version of clearml-server and clearml-agent has this taken effect with?
Hi this is the log. I didn't see any attempt from the agent to install virtualenv on the base image.
` 1618369068169 clearml-gpu-id-b926b4b809f544c49e99625380a1534b:gpuGPU-4ad68290-0daf-4634-6768-16fad73d47a3 DEBUG Current configuration (clearml_agent v0.17.2, location: /tmp/.clearml_agent.wgsmv2t9.cfg):
agent.worker_id = clearml-gpu-id-b926b4b809f544c49e99625380a1534b:gpuGPU-4ad68290-0daf-4634-6768-16fad73d47a3
agent.worker_name = clearml-gpu-id-b926b4b809f544c49e99625...