Sounds about right. You can also create reports from the best experiments and then reference the models via the report as well
Hi @<1569133676640342016:profile|MammothPigeon75> , I believe such SLURM integration of what you described is supported on ClearML Scale/Enterprise versions
What errors are you getting?
Can you also please share logs of the autoscaler?
Hi @<1566596960691949568:profile|UpsetWalrus59> , what if you set output_uri=False
Hi @<1533257407419912192:profile|DilapidatedDucks58> , how long did it persist? Can you try upgrading again and see the apiserver logs?
Looks decent, give it a try and update us it's working ๐
Hi EnormousCormorant39 , how did it fail?
What about if you specify the repo user/pass in clearml.conf?
I think it removes the user/pass so it wouldn't be shown in the logs
UnevenDolphin73 , can you verify that the process is not running on the machine? for example with htop or top
@<1545216070686609408:profile|EnthusiasticCow4> , I think add_files always generates a new version. I mean, you add files to your dataset, so the version has changed. Does that make sense?
For that you have the autoscaler - None
You can set up multiple instances of the autoscaler each spinning machines on different accounts
Please try like Kirill mentioned. Also please note that there is no file target in the snippet you provided ๐
Hi @<1632913959445073920:profile|IratePigeon23> , please look at the following thread - None
That is a nice example for using the API. After you handle the login issues, you can use the web UI as a reference for the API (use dev tools - F12 to see what the UI sends to the backend).
Let me know if this helps ๐
Just making sure - This path, is it inside the docker or outside the container?
DilapidatedDucks58 , I think this is what you want. You just configure it and don't need to changing anything in code.
https://github.com/allegroai/clearml/blob/92210e2c827ff20e06700807d25783691629808a/docs/clearml.conf#L106
You can pass any boto3 parameter here!
Do you mean to copy paste the uncommitted changes section and apply it to some local environment regardless of ClearML?
Please open developer tools (F12) and see if you're getting any console errors when loading a 'stuck' experiment
That's weird. Did you do docker-compose down and up properly?
We all do eventually ๐
Hi @<1523701260895653888:profile|QuaintJellyfish58> , I think where your code runs depends on where the clearml cache is. Search clearml.conf for cache under agent configuration ๐
I think all of them
In that case then yes, install the agent on top of the machine with the A100 with 8 GPUs
Hi @<1558986867771183104:profile|ShakyKangaroo32> , you can extract that using the API with events.debug_images - None
I suggest navigating to the UI and seeing what the UI sends when looking at debug samples with dev tools (F12)
I'm accessing both using SSH tunneling & the same domain
I guess we found the culprit ๐
Hi AdventurousButterfly15 , are you able to clone locally? What version of the agent are you using
I think you can simply reset and enqueue the task again for it to run. Question is, why did it fail?
Are you running the HPO example? What do you mean by adding more parameter combinations? If the optimizer task finished you either need a new one or to reset the previous and re-run it.
You can do various edits while in draft mode
CrookedWalrus33 , Hi ๐
Can you please provide which packages are missing?