Reputation
Badges 1
282 × Eureka!Hi SuccessfulKoala55 , would they need the fileserver to route to minio then? E.g.
This will ensure that any actions by clearml-data and models are saved into the S3 object store.
api {
files_server: s3://ecs.ai:80/clearml-data/default
}
aws {
s3 {
credentials {
host: http://ecs.ai:80
## Insert the iam credentials provided by your SAs here.
}
}
}
But if user forgot to do above, they will be saved on ClearML server. If I switch off f...
Hi, i dont't think clearml agent actually ran at that point in time. All i can see in the pod is
apt install of libpthread-stubs, libx11, libxau and libxcb1 packages. pip install of clearml-agentAfter the above are successful, the pod just hang there.
I want to rule out the glue being the problem. Is the Glue significant in initialising clearml-agent after the pod is spawned?
I have since ruled out the apt and pypi repos. Both of them are installing properly on the pods.
Hi, is this currently not working? http://app.community.clear.ml ? I noticed that cleaml UI will cache on the browser and if the backend is not running, its not clear to user that something is wrong (except for broken pages).
Clearing the cache entirely works. Thanks.
Thanks CostlyOstrich36 , how do i know how is the parts indexed in the first place? Or rather, how is chunk and parts defined? Say in the context of images, videos, text documents...etc.
Although I think you can also pull specific chunks of dataset
How do you do that with clearml-data?
Thanks SuccessfulKoala55 , how might I do this clean up? Does this increase with more use of ClearML? And to add, we save all artifacts onto a remote S3 server.
Hi AgitatedDove14 , what version i should change it to? I'm currently on v0.17.2rc3.
AgitatedDove14 , would you elaborate on this resolution process?
Is there anyway to see an error log from that?
Hi, I was expecting to see the container rather then the actual physical machine. For example, in the file panel on the left of the jupyter panel, I see the file contents of the physical machine. I was expecting this to be the container.
Hi it is missing --docker on the agent. Thanks! Dynamic GPU option only available with Enterprise version right?
I managed to find out why. The docker image I'm using is not set as root user thus the error. But I'm wondering why this is the case as docker best practices does indicate we should use a non root on production images.
Its actually in your documentation. Its removed since 0.17 apparently.
https://allegro.ai/clearml/docs/docs/release_notes/ver_0_17.html#clearml-agent-0-17-2
And this is my logs, it tried to install something and encountered permission denied. It wouldn't if it obeyed the force_repo_requirements_txt.
1620664917916 Kahs-MacBook-Pro.local info ClearML Task: created new task id=024a421c0e174650a1c7ff64af756c26 ClearML results page:
`
1620664920359 Kahs-MacBook-Pro.local info ClearML Mon...
Hi, just wondering if this 'feature: Passing env via the code' is in the works?
https://clearml.slack.com/archives/CTK20V944/p1616677400127900?thread_ts=1616585832.098200&cid=CTK20V944
The apply.yaml template is not working (E.g. the arguments env is not passed to the container), this is why i tried the code approaach instead.
so the clearml-agent daemon needs higher privilege?
Hi FriendlySquid61 , AgitatedDove14 , the issue and possible fix is in this issue raise. https://github.com/allegroai/clearml-agent/issues/51
No i didn't indicate this particular issue on the git issue. Only the apply template.yml is on the issue.
AgitatedDove14 , will these be fixed?
Passing env via the code Passing env via template yaml
yup. in this case it wasn't root. Removing that USER and -u
in pip solves the problem. However, in our production images, we are required to remove root access.
` FROM nvidia/cuda:10.1-cudnn7-devel
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y
python3-opencv ca-certificates python3-dev git wget sudo ninja-build
RUN ln -sv /usr/bin/python3 /usr/bin/python
create a non-root user
ARG USER_ID=1000
RUN useradd -m --no-log-init --system --uid ${USER_ID} a...
Hi AgitatedDove14 . I'm trying out passing env via the code instead.task.set_base_docker("nvcr.io/nvidia/tensorflow:19.11-tf2-py3 --env TRAINS_AGENT_GIT_USER=git_username_here --env TRAINS_AGENT_GIT_PASS=git_password_here")
So the strange thing is when my k8sglue pulls a task, this happens.
` Pulling task xxxxxxxxxx launching on kubernetes cluster
Pushing task xxxxxxxxxx into temporary pending queue
Kubernetes scheduling task id=xxxxxxxxxxxx
skipping docker argument TRAINS_AGENT_GIT_USE...
what feature on this paid roadmap are you referring to? I am indeed communicating with Noem on paid features.
Hi, any advice on this? thanks.
Hi, scenario as follows.
client.py runs task.execute_remotely(queue='myqueue', exit_process=True)
The API section of clearml.conf at client side is read in. client side calls clearml server and insert task into queue. K8S glue retrieves task from queue. Spawn a K8S pod. K8S pod performs git clone Error. ssh keys not found.
Each individual has their own key in the gitlab profile and gitlab is configured to only work via ssh.
We can't place the key in the image as this is as good as ...