Reputation
Badges 1
282 × Eureka!No i didn't indicate this particular issue on the git issue. Only the apply template.yml is on the issue.
Hi AgitatedDove14 . I'm trying out passing env via the code instead.task.set_base_docker("nvcr.io/nvidia/tensorflow:19.11-tf2-py3 --env TRAINS_AGENT_GIT_USER=git_username_here --env TRAINS_AGENT_GIT_PASS=git_password_here")
So the strange thing is when my k8sglue pulls a task, this happens.
` Pulling task xxxxxxxxxx launching on kubernetes cluster
Pushing task xxxxxxxxxx into temporary pending queue
Kubernetes scheduling task id=xxxxxxxxxxxx
skipping docker argument TRAINS_AGENT_GIT_USE...
Create immutable and differentiable versions on-prem or in the cloud with our data agnostic solution.
Sorry i don't quite understand this. The task itself was submitted as I run the code on the client. I suppose the dependancies requirements would be copied over as the experiment is cloned?
running git diff
on my terminal in this repo gave nothing. nothing at all.
Ok that works. thanks.
Ok that worked. So every time i have changes in codes, i will have to rerun the experiment on my own machine that doesn't have any GPUs?
Kinda defeat the purpose of using ClearML Agent.
Ok thanks, that worked.
Yes of cos, its a long one.
Yes, as listed in the snippet. The torch library is torchvision.
Thank. Gonna try that out. But i hit another snag. Strangely, the Agent is not creating the right venv. This is what the Agent created.
` pip:
- asn1crypto==0.24.0
- attrs==20.3.0
- certifi==2020.12.5
- chardet==4.0.0
- cryptography==2.1.4
- Cython==0.29.22
- furl==2.1.0
- future==0.18.2
- humanfriendly==9.1
- idna==2.6
- importlib-metadata==3.7.0
- jsonschema==3.2.0
- keyring==10.6.0
- keyrings.alt==3.0
- orderedmultidict==1.0.1
- pathlib2==2.3.5
- psutil==5.8.0
- pycrypto==2.6.1
- pygobject...
Next step to figure out if i can do all that in the python code instead of UI.
Hi AgitatedDove14 , i was refering totask.set_base_docker("nvcr.io/nvidia/tensorflow:19.11-tf2-py3 --env TRAINS_AGENT_GIT_USER=git_username_here --env TRAINS_AGENT_GIT_PASS=git_password_here")
The above will give errorskipping docker argument TRAINS_AGENT_GIT_USER=git_username_here (only -e --env supported) TRAINS_AGENT_GIT_PASS=git_username_here (only -e --env supported)
Hi, just wondering if this 'feature: Passing env via the code' is in the works?
https://clearml.slack.com/archives/CTK20V944/p1616677400127900?thread_ts=1616585832.098200&cid=CTK20V944
unfortunately, our security posture is so strict that we cannot have an agent git user that have unfettered read access to all repos.
Hi FriendlySquid61 , AgitatedDove14 , the issue and possible fix is in this issue raise. https://github.com/allegroai/clearml-agent/issues/51
The apply.yaml template is not working (E.g. the arguments env is not passed to the container), this is why i tried the code approaach instead.
AgitatedDove14 , will these be fixed?
Passing env via the code Passing env via template yaml
Hi thanks. How about Agent, does its docker mode or k8s mode require docker.sock to be exposed?
Thanks. Which brings me to the question. How does ClearML deal with all the CVEs? What is your process for response?
Hi, so this means if i want to use Kubernetes, i would have to 'manually' install multiple agents on all the worker nodes?
first line to make sure kubectl is connected to k8s.
the default for base_pod_num is 1.
Ok, that seems clearer, thanks.
I would like to run ClearML agent on kubernetes. So basically I need to run the image on a pod, but there isn't any information on how the agent would communicate with the code, nor how it would spawn more pods to run the task.
This is probably the whole script.
kubectl get nodes
pip install clearml-agent
python k8s_glue_example.py
python k8s_glue_example.py --queue gpu --namespace default
Traceback (most recent call last):
File "k8s_glue_example.py", line 86, in <module>
main()
File "k8s_glue_example.py", line 80, in main
namespace=args.namespace,
File "/home/administrator/clearml-agent-k8s/venv/lib/python3.6/site-packages/clearml_agent/helper/base.py", line 239, in _ call _
cls. instances[cls] = super(Singleton, cls). call_(*args, **kwargs)
TypeError: _ init _() got an unexpected keyword argument 'base_pod...
The doc also mentioned preconfigured services with selectors in the form of
"ai.allegro.agent.serial=pod-<number>" and a targetPort of 10022.
Would you have any examples of how to do this?
Hi AgitatedDove14 , i've got the same error. It would appear that the code references clearml_agent/helper/base.py
which i believe is part of clearml-agent v0.17.1. Could that be the issue?