It looks like you're running on different machines and the file your code is looking for is not available on the other machine
Hi @<1613344994104446976:profile|FancyOtter74> , I think this is cause because you're creating a dataset in the same task. Therefor there is a connection between the task and the dataset and they are moved to a special folder for datasets. Is there a specific reason why you're creating both a Task & Dataset in the same code?
Also, in the original experiment, what pytorch version is detected?
Is the entire pipeline running on the autoscaler?
Hi @<1652120623545061376:profile|FrightenedSealion82> , do you see any errors in the apiserver or the webserver containers?
And are they the same tasks?
I mean if you were to run the 'failing' task first, it would run, correct?
I'm not sure I understand this config, is this an autoscaler for GCP or AWS?
Hi DangerousDragonfly8 , can you please elaborate on your use case? If you want only a single instance to exist at any time how do you expect to update it?
What exactly are you looking to set up?
Try to set agent.enable_git_ask_pass: true for the agent running inside the container, perhaps that will help
Hi @<1523704674534821888:profile|SourLion48> , making sure I understand - You push a job into a queue that an autoscaler is listening to. A machine is spun up by the autoscaler and takes the job and it runs. Afterwards during the idle time, you push another job to the same queue, it is picked up by the machine that was spun up by the autoscaler and that one will fail?
Hi @<1590514584836378624:profile|AmiableSeaturtle81> , the hotfix should right around the corner 🙂
Didn't have a chance to try and reproduce it, will try soon 🙂
What is this http://unicorn address? Did you deploy using docker compose?
Hi @<1577468638728818688:profile|DelightfulArcticwolf22> , what email did you use? Can you try again now?
Just so I understand the scenario - You're using Minio (I assume no special configs) and when saving debugs Minio all the files are there but in the UI you can't view it, correct? How did you save debug samples to Minio - by default they are always saved to fileserver?
Hi DangerousDragonfly8 , I believe this is supported. You just need to have the object already with links embedded inside
Hi ShallowGoldfish8 ,
I'm not sure I understand the scenario. Can you please elaborate? In the end the model object is there so you can easily fetch the raw data and track it.
Hi @<1540867420321746944:profile|DespicableSeaturtle77> , I think you need to define it per step
ClearML detects the repository & dependencies automatically. How are you currently set the requirements file?
Hi @<1578555761724755968:profile|GrievingKoala83> , ClearML only tracks data (datasets/models) and keeps links to it. Or are they saved to the files server?
Hi TrickyFox41 , how did you save the debug samples? What is the URL of the image?
Hi TrickyFox41 , I think this issue is solved in 1.9.0, please update to the latest version of clearml
I might be wrong. Did you try 1.9.1?
@<1664079296102141952:profile|DangerousStarfish38> , can you provide logs please?
What error are you getting when you run docker compose?