Reputation
Badges 1
282 × Eureka!Thanks SuccessfulKoala55 . I can try my hand on a patch. But the pod spinning is handled by the k8s glue, which has no link to the client side. How should the client pass the key over to k8s glue during runtime via clearml server?
Thanks SuccessfulKoala55 . Just pm'ed him.
Thanks this would be a good alternative before the enterprise version comes in. How is this different from argparser btw?
That didn't work as well...
Sorry AgitatedDove14 i missed your reply. So this means that in the community version, when i have an experiment using clearml and it uses clearml datasets SDK, the dataset id that was used will not be reflected on the clearml experiment in any way, thus making it impossible to establish Data Lineage/Provenance. (E.g. Link data used to experiment). This feature is however available in the Enterprise Version as HyperDatasets. Am i correct?
Code example.
` from clearml import Task, Logger
tas...
which clearml.conf is it refering to? I'm executing on my client, which is then remotely executed by the agent. Both of them has ~/clearml.conf.
Hi CostlyOstrich36 , What you described is task. I was referring to the pipeline controller.
Thanks. Which brings me to the question. How does ClearML deal with all the CVEs? What is your process for response?
Having same issues. Looks like Google DNS can't resolve the DNS at all.
` %nslookup app.clear.ml - 8.8.8.8
Server: 8.8.8.8
Address: 8.8.8.8#53
** server can't find app.clear.ml: SERVFAIL `
I'm having the same problem. You using latest clearmagent? Is your docker image a root user by default?
I also see this on my logs, noting that the config is read in but its still printing the supposedly hidden keys on the logs and UI.agent.hide_docker_command_env_vars.enabled = true agent.hide_docker_command_env_vars.extra_keys.0='TRAINS_AGENT_GIT_USER' ..... docker_cmd=harbor.ai/public/detectron2:v3 --env TRAINS_AGENT_GIT_USER=gituser
Nice, what are the names of the talks?
Hi, by deployment strategies I meant by canary, blue-green...etc..etc. I figured this should be done by clearml-serving and maybe seldon as well.
This is strange then, is it possible for clearml logs to register successfully saving into a S3 storage when actually it isn't? For example, i've seen in past experiences with certain S3 client that saved onto a local folder called 's3:/' instead of putting it on S3 storage itself.
I can't seem to find the fix to this. Ended up using an image that comes with torch installed.
Ah ok. So it will be fixed on the ClearML server web UI as well? (See screenshots).
Yes, as listed in the snippet. The torch library is torchvision.
Hi AgitatedDove14 , i've got the same error. It would appear that the code references clearml_agent/helper/base.py
which i believe is part of clearml-agent v0.17.1. Could that be the issue?
This is probably the whole script.
kubectl get nodes
pip install clearml-agent
python k8s_glue_example.py
the default for base_pod_num is 1.
I would like to run ClearML agent on kubernetes. So basically I need to run the image on a pod, but there isn't any information on how the agent would communicate with the code, nor how it would spawn more pods to run the task.
After some churning, this is the answer. Change it in the clearml-agent init
generated clearml.conf.
` default_docker: {
# default docker image to use when running in docker mode
image: "nvidia/cuda:10.1-runtime-ubuntu18.04"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]
arguments: ["--env GIT_SSL_NO_VERIFY=true",]
} `
Hi, so this means if i want to use Kubernetes, i would have to 'manually' install multiple agents on all the worker nodes?
Sorry take back. Just realised that this argument only worked on running the agent, but when you enqueue a task into this agent, the argument is not passed on to the container that the agent spawned.
This is the same issue for the docker image. It reverts back to nvidia/cuda:10.1-runtime-ubuntu18.04 despite me setting something else.
Ok, that seems clearer, thanks.
Executing task id [228caa5d25d94ac5aa10fa7e1d02f03c]:
repository = https://192.168.50.88:18443/tkahsion/pytorchmnist
branch = master
version_num = cfb833bcc70f3e10d3b6a96cfad3225ed682382b
tag =
docker_cmd = nvidia/cuda:10.1-runtime-ubuntu18.04
entry_point = pytorch_mnist.py
working_dir = .
Warning: could not locate requested Python version 3.9, reverting to version 3.6
Using base prefix '/usr'
New python executable in /root/.clearml/venvs-builds/3.6/bin/python3.6
Also creating executable i...
Hi, i tried the k8s-glue on my k8s setup and needed some clarifications on some of the arguments.
--queue. Does this only refer to default and service? How can i create new queue to which it can sync with the ClearML server? --ports-mode. I'm not sure what ports mode does. doc says "add a label to the pod which can be used as service". Which pod is it referring to in the first place? All args pertaining to --ports-mode. (E.g. base-pod-num, gateway-address...etc) --overrides-yaml. What is the ...
The doc also mentioned preconfigured services with selectors in the form of
"ai.allegro.agent.serial=pod-<number>" and a targetPort of 10022.
Would you have any examples of how to do this?