
Reputation
Badges 1
282 × Eureka!The doc also mentioned preconfigured services with selectors in the form of
"ai.allegro.agent.serial=pod-<number>" and a targetPort of 10022.
Would you have any examples of how to do this?
Then you pass the tolerations definition through a different pod template?
Yup.
yup. in this case it wasn't root. Removing that USER and -u
in pip solves the problem. However, in our production images, we are required to remove root access.
` FROM nvidia/cuda:10.1-cudnn7-devel
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y
python3-opencv ca-certificates python3-dev git wget sudo ninja-build
RUN ln -sv /usr/bin/python3 /usr/bin/python
create a non-root user
ARG USER_ID=1000
RUN useradd -m --no-log-init --system --uid ${USER_ID} a...
Hi CostlyOstrich36 , thanks. I will check with the Enterprise team then.
Do you mean by this that you want to be able to seamlessly deploy models that were tracked using ClearML experiment manager with ClearML serving?
Ideally that's best. Imagine that i used Spacy (Among other frameworks) and i just need to add the one or two lines of clearml codes in my python scripts and i get to track the experiments. Then when it comes to deployment, i don't have to worry about Spacy having a model format that Triton doesn't recognise.
Do you want clearml serving ...
This would be solved if --env GIT_SSL_NO_VERIFY=true is passed to the k8s pod that's spawned to run the job. Currently its not.
I think a related question is, ClearML replies heavily on Triton (Good thing) but Triton only support a few frameworks out of the box. So this 'engine' need to make sure its can work with Triton and use all its wonderful features such as request batching, GPU reuse...etc.
Hi.
We tried as advised above and it still didn't work.
Host: http://ecs.ai:443
output_uri = S3://ecs.ai:443/bucketname
This time round the client gave this error.
Botocore.exceptions.connectiinclosederror: connection was closed before we received a valid response from endpoint URL: ' http://ecs.ai/bucketname/.clearml.test '.
It's quite apparent that whatever clearml passed to boto3 ends up as a http call instead of https, which is wrong.
Having same issues. Looks like Google DNS can't resolve the DNS at all.
` %nslookup app.clear.ml - 8.8.8.8
Server: 8.8.8.8
Address: 8.8.8.8#53
** server can't find app.clear.ml: SERVFAIL `
Ok thanks. that explains alot. We have been doing this wrongly the whole time, thinking that the clearml.conf on the client side would be acknowledged by the remote agent execution. In reality, only the API section is utilised.
Hi, so you meant i need to installl virtualenv in my base image?
AgitatedDove14 , will these be fixed?
Passing env via the code Passing env via template yaml
Yeah that'll cover the first two points, but I don't see how it'll end up as a dataset catalogue as advertised.
The server is running only the ClearML components. Could you advise on the ELB part, how should we optimise it?
Thanks SuccessfulKoala55 . I can try my hand on a patch. But the pod spinning is handled by the k8s glue, which has no link to the client side. How should the client pass the key over to k8s glue during runtime via clearml server?
ah ok, so if i see Jax's workspace on https://app.community.clear.ml/dashboard , then i'm on the right track? How regular does this reset then?
I also see this on my logs, noting that the config is read in but its still printing the supposedly hidden keys on the logs and UI.agent.hide_docker_command_env_vars.enabled = true agent.hide_docker_command_env_vars.extra_keys.0='TRAINS_AGENT_GIT_USER' ..... docker_cmd=harbor.ai/public/detectron2:v3 --env TRAINS_AGENT_GIT_USER=gituser
Ok, that seems clearer, thanks.
Hi, by deployment strategies I meant by canary, blue-green...etc..etc. I figured this should be done by clearml-serving and maybe seldon as well.
Thanks. Which brings me to the question. How does ClearML deal with all the CVEs? What is your process for response?
For example, it would useful to integrate https://github.com/whylabs/whylogs#features into ClearML as part of data and model monitoring. WhyLogs would have their own static page that would preferably be displayed as a new custom tab (besides logs, scalars and plots.).
Hi, i dont't think clearml agent actually ran at that point in time. All i can see in the pod is
apt install of libpthread-stubs, libx11, libxau and libxcb1 packages. pip install of clearml-agentAfter the above are successful, the pod just hang there.
This is a env var?
CLEARML_CONFIG_FILE
ok thanks. this would mean that increasing the disk space for my ClearML is the only option as we are not at liberty to delete.
Hi. If we disable the API service, how will it affect the system? How do we disable?
Thanks. Have a better understanding now.
Hi AgitatedDove14 , that's what i am trying to figure out as well. The task has nothing to do with torch, and the requirements.txt doesn't have any torch packages as well.