ah ok, so if i see Jax's workspace on https://app.community.clear.ml/dashboard , then i'm on the right track? How regular does this reset then?
I also think it make sense that when you do certain definitive CI actions like publish, it would support some custom scripts to run.
Hi, i was reading this thread and wondered which version of clearml-server and clearml-agent has this taken effect with?
Hi, Self-hosted using docker-compose.
Hi SuccessfulKoala55 , thanks. Opened issue on the CLearml-Agent GH at https://github.com/allegroai/clearml-agent/issues/67
ok, i'll wait till i get my hands on vault then. thanks.
I would say its intermittent.
The apply.yaml template is not working (E.g. the arguments env is not passed to the container), this is why i tried the code approaach instead.
Hi thanks.
So i suppose ClearML make use of the information in .git folder at the root of the script folder to gather those info.
I have yet to go through thoroughly with ClearML agent. TimelyPenguin76 , so if i run a training with uncommited changes and didn't commit/push after. When i clone the task, isn't ClearML agent unable to pull that script from the git repo?
Hi. If we disable the API service, how will it affect the system? How do we disable?
Ok thanks.
Hi CostlyOstrich36 , thanks. I will check with the Enterprise team then.
It would make sense on a very large resource cluster. Unfortunately we only have less than 50 GPUs to share across. A multi-tenant SAAS would cut the resources into even more smaller clusters and not help with efficiency. Or would you have a suggestion?
Any comments on using the global python libraries without the need to 'pip install' anything?
Hi, please correct me if i am wrong, to use the glue, i need the following.
A k8s cluster A kubectl that is connected to the k8s cluster A pip install of clearml-agent 0.17.1
So i did all the above, I'm not what it meant by running the entire thing on own machine.
Hi, clearml-agent==0.17.2rc3 did work. I'm on a 1.19 k8s cluster, and has this error when a task is pulled. Is the glue not compatible with 1.19?
` Pulling task 3a90802d1dfa4ec09fbccba0beffbaa8 launching on kubernetes cluster
Pushing task 3a90802d1dfa4ec09fbccba0beffbaa8 into temporary pending queue
Kubernetes scheduling task id=3a90802d1dfa4ec09fbccba0beffbaa8
kubectl output:
Flag --replicas has been deprecated, has no effect and will be removed in the future.
Flag --generator has been depre...
Unfortunately it's not. The problem previously encountered with the docker method surfaced again. In this case, the BASE DOCKER IMAGE
nvidia/cuda:10.1-runtime-ubuntu18.04 --env GIT_SSL_NO_VERIFY=true
is not taking effect with the k8s glue.
This would be solved if --env GIT_SSL_NO_VERIFY=true is passed to the k8s pod that's spawned to run the job. Currently its not.
Sorry, in case i misunderstood you. Are you refering to the extra_docker_shell_script
.
Thanks SuccessfulKoala55 , how might I do this clean up? Does this increase with more use of ClearML? And to add, we save all artifacts onto a remote S3 server.
Thanks AgitatedDove14 , unfortunately it didn't take effect.
i passed it through the yaml as follows.apiVersion: v1 kind: Pod spec: containers: - image: clearml-agent:latest" env: - name: PIP_INDEX_URL value: "
" - name: PIP_TRUSTED_HOST value: "192.168.56.253" - name: PIP_FIND_LINKS value: "
` "
- name: GIT_SSL_NO_VERIFY
value: true
resources:
requests:
cpu: "2"
...
So these (PIP_INDEX_URL) weren't used when clearml starts running pip.
Do you mean this?Removing containers section: [{'image': 'clearml-agent:latest"', 'env': [{'name': 'PIP_INDEX_URL', 'value': '
'},
What's the diff between template-yaml and --overrides-yaml? I used the latter to ensure the gpu is passed in.
I'm also noticing a lot of this while the k8s glue is running.Ex: Expecting value: line 1 column 1 (char 0) K8S Glue pods monitor: Failed parsing kubectl output: