Thanks AgitatedDove14 , unfortunately it didn't take effect.
Ok that worked. So every time i have changes in codes, i will have to rerun the experiment on my own machine that doesn't have any GPUs?
Kinda defeat the purpose of using ClearML Agent.
Hi CostlyOstrich36 , thanks. I will check with the Enterprise team then.
Having same issues. Looks like Google DNS can't resolve the DNS at all.
` %nslookup app.clear.ml - 8.8.8.8
Server: 8.8.8.8
Address: 8.8.8.8#53
** server can't find app.clear.ml: SERVFAIL `
Thanks. This appears to be solely for web UI and API, What if i want to orchestrate on K8S?
Hi, I was expecting to see the container rather then the actual physical machine. For example, in the file panel on the left of the jupyter panel, I see the file contents of the physical machine. I was expecting this to be the container.
Is there enterprise support for k8s glue on OpenShift?
Hi SuccessfulKoala55 , is there a channel here that posts version updates?
Hi, please correct me if i am wrong, to use the glue, i need the following.
A k8s cluster A kubectl that is connected to the k8s cluster A pip install of clearml-agent 0.17.1
So i did all the above, I'm not what it meant by running the entire thing on own machine.
Yeah that sounds good. But from user perspective, especially the untrained, they wouldn't know what to point to. Example, some may think it's an exe, some think it's a zip bundle, and others think it's any github repo with the word vscode.
I want to rule out the glue being the problem. Is the Glue significant in initialising clearml-agent after the pod is spawned?
Ok. I noted this is due to the venv_update setting. It needs to be disabled as it has a dependancy on the internet url. We can close this.
f you can directly access the machine running the agent, yes you could. If not reverse proxy is in the workingÂ
Hi AgitatedDove14 , i might have misunderstood your previous comment above. Do you mean that clearml-session can only work regardless of whether xforwarding is configured, if we have direct access to the Kubernetes worker when we run K8S glue?
We did some testing today and clearml-session tried to tunnel with a k8s cluster ip, and thus failed.
If we setup a ingress with Me...
Its 1.0.0. As printed on the top of the logs in ClearML Server UI.
Hi, the idea is to load the gituser and password into the --env by loading it via a env var so the client could access the resources without divulging the credentials in source code and it would be removed after completion since the container would be removed. Its actually doing well with ClearML except the part that the agent seems to print the content of docker_cmd on running the task.
I would like to note that this behaviour doesn't exist with the clearml-agent daemon though. It only exis...
Yes for both clearml and clearml-agent
yeah, someone should call them out.
Hi, this is what i got. No mention of the env variables.
` Current configuration (clearml_agent v0.17.2, location: /home/jax/clearml.conf):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
ap...
the default for base_pod_num is 1.
I think a related question is, ClearML replies heavily on Triton (Good thing) but Triton only support a few frameworks out of the box. So this 'engine' need to make sure its can work with Triton and use all its wonderful features such as request batching, GPU reuse...etc.
unfortunately, our security posture is so strict that we cannot have an agent git user that have unfettered read access to all repos.
Can i dig into the mongodb or ES to pull these data?
Hi SuccessfulKoala55 , thanks, tested the patch and its working as expected now.
Hi, it make sense to automate this part just like how you automate the rest of the MLOps flow, especially when you already support Data Versioning/Lineage, Data Provenance (How it works with the experiment and as a model source) should be in too. Although i agree technically it's probably not possible to tell if the users actually used the indicated datasets after they do a datasets.get_copy() .
Hi HelpfulDeer76 , I'm facing similar issues. Would you mind describing in detail how you deploy clearml-agent? Is it running as a pod on k8s?
Hi AgitatedDove14 , what version i should change it to? I'm currently on v0.17.2rc3.
I would like to run ClearML agent on kubernetes. So basically I need to run the image on a pod, but there isn't any information on how the agent would communicate with the code, nor how it would spawn more pods to run the task.