Reputation
Badges 1
25 × Eureka!Martin, thank you very much for your time and dedication, I really appreciate it
My pleasure 🙂
Yes, I have latest 1.0.5 version now and it gives same result in UI as previous version that I used
Hmm are you saying the auto hydra connection doesn't work ? is it the folder structure ?
When is the Task.init is called ?
See example here:
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
MortifiedDove27 did you update to the latest cleaml python package ?
Would love to just cap it at a fixed amount for a month for API calls.
Try the timeout configuration, I think this shoud solve all your issues, and will be fairly easy to set for everyone
Hi PompousBeetle71
I remember it was an issue, but it was solved a while ago. Which Trains version are you using?
Thanks MortifiedDove27 ! Let me see if I can reproduce it, if I understand the difference, it's the Task.init in a nested function, is that it?
BTW what's the hydra version? Python, and OS?
LOL that's the spirit , making your team happy is key to success in adoption 🙂
Hmm what do you mean? Isn't it under installed packages?
We do upload the final model manually.
wait you said upload manually, and now you are saying "saved automatically", I'm confused.
Hi SmallDeer34
The any generally any pytorch.save(...) is logged/uploaded by clearml
automatically. specifically in your case I think the only missing one is the trainer_sate.json, which I assume is general json file, and I imagine is part of huggingface framework. You can easily upload it as additional artifact with Task.upload_artifact
wdyt?
The difference is that I want a single persistent machine, with a single persistent python script that can pull execute and report multiple tasks
So basically instead of using the agent, so simply spin a sub process ?
I think the limit is a few GB, I'm not sure, I'll have to check
And yes the oldest experiments will be deleted first (with the exception of published experiments, they will be deleted last)
Oh my bad, post 0.17.5 😞
RC will be out soon, in the meantime you can install directly from github:pip install git+
Okay this seems correct...
Can you share both yaml files (server & serving) and env file?
Hi CheekyElephant36
First you need to run it once on your machine, once this is done (only a few steps is enough), you can one it and enqueue it. Then to actually connect the aws autoscaler (the part that spins machines and runs tasks) go to applications and select the aqs autoscaler.
Btw i think the next video will be about YOLO + autoscaler
Do you think this is better ? (the API documentation is coming directly from the python doc-string, so the code will always have the latest documentation)
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/clearml/datasets/dataset.py#L633
Will this still be considered as
global site-packages
This is a pip settings, I "think" it inherits from the local user's installation, but I would actually install with "sudo pip" that will definitely be "inherited"
Exactly !
it seems like each task is setup to run on a single pod/node based on the attributes like
gpu memory
,
os
,
num of cores,
worker
BoredHedgehog47 of course you can scale on multiple node.
The way to do that is to create a k8s Yaml with replicas, each pod is actually running the exact same code with the exact same setup, notice that inside the code itself the DL frameworks need to be able to communicate with one another and b...
Actually this is by default for any multi node training framework torch DDP / openmpi etc.
OutrageousSheep60
I found the task in the UI -
and in the
UNCOMMITTED CHANGES
execution section there is
No changes logged
This is the issue.
and then run the
session
via docker
clearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 \ --packages "clearml" "tensorflow>=2.2" "keras" \ --queue MY_QUEUE \ --verbose
Are you running the "cleamrl-session" from your machine? (i.e. not from inside a docker) ?...
Hi JuicyFox94 ,
Actually we just added that 🙂 (still on GitHub , RC soon)
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L696
Thanks ShallowCat10 !
I'll make sure we fix it 🙂
Hi @<1556812486840160256:profile|SuccessfulRaven86>
Please notice that the clearml serving is not designed for public exposure, it lacks security layer, and is designed for easy internal deployment. If you feel you need the extra security layer I sugget either add external JWT alike authentication, or talk to the clearml people, their paid tiers include enterprise grade security on top
Hi @<1704304350400090112:profile|UpsetOctopus60>
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_kubernetes_helm
Just use the helm charts. It's the easiest
from clearml import TaskTypes
That will only work if you are using the latest from the GitHub, I guess the example code was modified before a stable release ...
Hi OddShrimp85
If you pass 'output_uri=True' to task init, it will upload the model automatically, or as you said manually with outputmodel class