Hi @<1623491856241266688:profile|TenseCrab59> , you can go into settings (profile icon on the top right of the UI) -> Configuration - Enable 'Show Hidden Projects'
Hi GrittyCormorant73 , I think tags is what you're looking for. You can add tags to model and you can actually tag the current running model. What do you think?
Hi @<1709015393701466112:profile|ScatteredPeacock14> , you are correct, this feature is available only in the Scale/Enterprise plans.
I'm afraid there isn't anything besides unregistering/re-registering
For example doing a fresh installation of the previous version and using backup data so you have a fresh state. Then try to upgrade again.
Hi @<1643060807786827776:profile|WorriedPeacock92> , you mean like this?
None
Then I think this is something you need to implement in your script to mark the task as failed if the spot goes down
That's an option This really depends on your usage - if you want those 'custom parameters' be accessible by other tasks, then save them as artifacts. If you only want visibility - then save them as scalars. You have a nice example on usage here: https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py
Can you try with latest ClearML
Hi SuperiorPanda77 ,how are the tasks running? Locally or via agent? What does the log show?
I would suggest googling that error
Hi @<1523701240951738368:profile|RoundMosquito25> , when a different user runs something with their own generated credentials from the UI will show as different user. Does that clear things up?
Are you sure the files server is correctly configured on the pods ?
Hi @<1566959349153140736:profile|ShinyChicken29> , when you try to access the image on your browser, the browser tries access the S3 bucket directly - This is why you get the popup. Data never goes through ClearML backend. Makes sense?
Hi @<1590514572492541952:profile|ColossalPelican54> , you can use the Logger module to manually report metrics - None
Is it the services docker that comes with the docker compose or did you run your own agent?
Hi @<1546665634195050496:profile|SolidGoose91> , I think this capability exists when running pipelines. The pipeline controller will detect spot instances that failed and will retry running them.
Are you using the PRO or the open source auto scaler?
I think its possible there was an upgrade in Elastic, I'd suggest going over the release notes to see if this happened with the server
You need to separate the Task object itself from the code that is running. If you're manually 'reviving' a task but then nothing happens and no code is running then the task will get aborted eventually. I'm not sure I understand entirely what you're doing but I have a feeling you're doing something 'hacky'.
GrittyKangaroo27 hi!
Can you please elaborate what your use case for deployment?
Besides that, I'm happy to say that ClearML supports all the cases above 🙂
Also for some further reading:
https://clear.ml/products/clearml-deploy/
https://allegro.ai/clearml/docs/rst/deploying_clearml/deploying_clearml_formats/index.html
https://github.com/allegroai/clearml-serving
I see. The difference is minute but still there
Hi @<1523701260895653888:profile|QuaintJellyfish58> , can you elaborate on what uv is?
UnevenDolphin73 , can you provide a small snippet of exactly what you were running? Are you certain you can see the task in the UI? Is it archived?
You can add it pythonically to the start of your script but I think docker mode is what you need to use if you want to pre-install packages in an environment
@<1523701168822292480:profile|ExuberantBat52> , did you add debug samples in a similar fashion? What version of the clearml sdk are you using? Also what server?
Hi @<1769534182561681408:profile|ReassuredFrog10> , do you have a GPU available? Maybe try the other docker compose without Triton as that one is specifically built for GPU inference.
Hi, SuperiorPanda77 , this looks neat!
I could take a look on a windows machine if it helps 🙂
CluelessElephant89 , did you run the vm.max_map_count command for elastic? Also what amount of RAM memory do you have on the machine you're running on?