
Reputation
Badges 1
25 × Eureka!Long story short, not any longer (in previous versions of k8s it was possible, but after the runtime container change it is not supported)
you mean in the enterprise
Enterprise with the smarter GPU scheduler, this is inherent problem of sharing resources, there is no perfect solution, you either have fairness, but then you get idle GPU's of you have races, where you can get starvation
Hi GrittyCormorant73
When I archive the pipeline and go into the archive and delete the pipeline, the artifacts are not deleted.
Which clearml-server version are you using? The artifact delete was only recently added
Hi PlainSquid19
Any model stored by TF/Keras/PyTorch/Joblib will automatically appear in the artifact/models tab.
Are you asking on how to add one manually ?
Hmm that is odd, can you send an email to support@clear.ml ?
is there something else in the conf that i should change ?
I'm assuming the google credentials?
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113
Hi AstonishingRabbit13
now Iβm training yolov5 and i want to save all the info (model and metrics ) with clearml to my bucket..
The easiest thing (assuming you are running YOLOv5 with python train.py
is to add the following env variable:CLEARML_DEFAULT_OUTPUT_URI="
" python train.py
Notice that you need to pass your GS credentials here:
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113
This is the prerequisites of the docker service installed on the host machine (where the agent is running)
Basically follow: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
https://docs.docker.com/compose/gpu-support/
Hi DizzyPelican17
Iβd like to configure requirements file, docker image, docker command for my pipeline controller, but it seems I cannot set it up. Am I missing something?The decorator itself accepts those as arguments:
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#pipelinedecoratorcomponent
https://github.com/allegroai/clearml/blob/90f30e8d9a5ca9a1afa6b2e5ffccb96b0afe9c78/examples/pipeline/pipeline_from_decorator.py#L8
Iβd like to setup up...
ImmensePenguin78
I think the latest RC adds it, should be released later today π
but I belive it should have work with 0.14.1 as well
Correct
Hi WearyLeopard29
Yes π this is exactly how it should work
What is the specific use case, updating a file on existing dataset and creating a new version?
Hi @<1661904968040321024:profile|SpotlessOwl43>
My problem is that when the AWS virtual machine is killed, my Pipelines and Scheduling stop working because of the killed ClearML agent,
are you using the ClearML AWS autoscaler to spin that machine ? or are you spinning it manually ?
Yes exactly like a Task (pipeline is a type of task)
'''
clonedpipeline=Task.clone(pipeline_uid_here)
Task.enqueue(...)
'''
CostlyOstrich36 did you manage to reproduce it?
I tried conda w/ python3.9 on a clean Windows VM , and it worked as expected ....
LovelyHamster1 Now I see... Interesting credentials ability. Specifically all the S3 access on trains is derived from the ~/clearml.conf
credentials section :
https://github.com/allegroai/clearml/blob/ebc0733357ac9ead044d0ed32d41447763f5797e/docs/clearml.conf#L73
( or the AWS S3 environment variables )
I'm not sure how this AWS feature works, I suspect it is changing the AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY variables on the ec2 instance. If this is the case, it should work out of...
- ...that file and the logs of the agent service always say the same thing as before:
Oh in that case you need feel in Your credentials here:
https://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L137
Basically CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY will let the agent running inside the docker talk to the server itself. Just put your own credentials there as a start, it should solve the issue
ScantMoth28 where are you seeing this warning ?
So far my local and remote gitlab repositories are synchronized, I suspect, thatΒ
Failed applying git diff, see diff above
Β error is caused by cached repository from which clearml tries to run the process. I've cleaned the cache, but it haven't helped.
Hmm can you test with empty "uncommitted changes" ?
Just making sure when you say still does n't work, you are not trying to run the Task with the git diff that includes teh binary data right?
Yes, but where I can fi...
LOL, thanks!
Is the agent itself registered on the clearml-server (a.k.a can you see it in the UI?)
Hi @<1687653458951278592:profile|StrangeStork48>
I have good news, v1.0 is out with hashed passwords support.
Hi @<1552101458927685632:profile|FreshGoldfish34>
self-hosted, you mean the open source ? if so, then yes totally free π
That said I would recommend to have the server inside your VPN, just in case from a security perspective
okay, let me know if it works
Based on the log you have shared:OSError: [Errno 28] No space left on device
I would increase the storage ?
https://github.community/t/github-actions-failing-with-errno-28-no-space-left-on-device/18164/10
https://stackoverflow.com/questions/70175977/multiprocessing-no-space-left-on-device
https://groups.google.com/g/ansible-project/c/4U6MyvyvthQ
I would start by increasing the size of the TMPDIR folder
Just run once (from your python console / pycharm etc.):
https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py