VictoriousPenguin97 , Hi 🙂
Can you provide a snippet of how you tried to download the file? Also what version of clearml are you using? Also can you please give an example of the filename you have on s3?
How did the tasks fail?
WackyRabbit7 , please skim over here 🙂
https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all
Hi @<1582179661935284224:profile|AbruptJellyfish92> , looks like a bug. Can you please open a GitHub issue to follow up on this?
Not one known to me, also, it's a good practice to implement (Think of automation) 🙂
Since the "grand" dataset will inherit from the child versions you wouldn't need to have data duplications
Hi FierceHamster54 , you have docker_args in https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#pipelinedecoratorcomponent
Hi @<1639799308809146368:profile|TritePigeon86> , if I understand you correctly, you're basically looking for a switch in pipelines (per step) to say "even if step failed, continue the pipeline"?
Also, is it an AWS S3 or is it some similar storage solution like Minio?
I think you need to specify some pythonic object to work with torch.save() - as it appears in their documentation:
https://pytorch.org/docs/stable/generated/torch.save.html
AbruptWorm50 , that's strange. I'll take a look as well. What version of clearml
are you using?
I think this is what you're looking for - the agent integration
None
Hi @<1539780272512307200:profile|GaudyPig83> , thanks for the report! Does it always happen? Can you please post the result in text for easier reading?
@<1523701083040387072:profile|UnevenDolphin73> , basically, it scales to as many pods as you like. Very similar to the autoscaler but on top of K8s
I think you can configure agent.reload_config
in clearml.conf
and then push the change in the file programmatically somehow
And the experiments ran on agents or locally (i.e pycharm/terminal/vscode/jupyter/...)
Can you copy paste the error you got?
Regarding your questions:
disable VCS cache - https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf#L120 I think lock is created when running an experiment, maybe it hung so the lock never got lifted
wdyt?
Hi @<1554638160548335616:profile|AverageSealion33> , if you must have some separation between dev and prod it would be a good idea. Can you elaborate on what you mean regarding clearml features?
Hi @<1523703397830627328:profile|CrookedMonkey33> , not sure I follow. Can you please elaborate more on the specific use case?
Currently you can add plots to the preview section of a dataset
I'm guessing this is a self deployed server, correct?
Hi @<1734020162731905024:profile|RattyBluewhale45> , are they running anything? Can you see machine statistics on the experiments themselves?
Can you share a screenshot of the workers page?
What version is the server? Do you see any errors in the API server or webserver containers?
It's all configured by the helm chart, it is the glue layer between K8s & ClearML
Yeah this is a lock which is always in our cache, cant figure out why it's there, but when I delete the lock and the other files, they always reappear when I run a new clearml task.
Is the lock something that occurs on your machine regardless of ClearML?
Disabling the VCS cache will no longer cache the cloned git folder You can filter by 'Running' Experiments in ClearML and search for one that hasn't reported for a while and start investigating those
Hi @<1797438038670839808:profile|PanickyDolphin50> , can you please elaborate? What is this accelerate functionality?
Hi @<1531445337942659072:profile|OddCentipede48> , can you please create a video reproduction?