Hi, regarding your questions:
If you create and finalize the dataset, it should upload the file contents to the fileserver (or any other storage you configure). The dataset is an object similar to a task - it has a unique ID You can add metric columns to the experiments table. You can do this by clicking the little cog wheel at the top right of the experiments table. You can also select multiple experiments and compare them (Bottom left on the bar that appears after selecting more than 1 expe...
Hi @<1631102016807768064:profile|ZanySealion18> , I think this is what you're looking for:
None
and just making sure - by pipeline we're talking about the ClearML pipelines, correct?
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller
Hi @<1558624430622511104:profile|PanickyBee11> , how are you doing the multi node training?
UnsightlyHorse88 & ShallowGoldfish8 , can you please provide a code snippet to play with?
Usually tasks are timed out by default after not having any action after 2 hours. I guess you could just keep the task alive as a process on your machine by printing something once every hour or 30 minutes
I mean that task.init will automatically detect the repo without specifying it
Hi @<1693795218895147008:profile|FlatMole2> , is it possible that the apiserver.conf file isn't persistent and somehow changes?
You can export it in the same shell you run the agent in and that should work for example
export FOO=bar clearml-agent daemon ...
How are you trying it programatically? Are you providing API keys for authentication?
Hi @<1523701083040387072:profile|UnevenDolphin73> , you're using a PRO account, right?
Hi StraightParrot3 , page_size is indeed limited to 500 from my understanding. You need to scroll through the tasks. The first tasks.get_all response will return scroll_id , you need to use this scroll_id in your following call. Every call afterwards will return a different scroll_id which you will always need to use in your next call to continue scrolling through the tasks. Makes sense?
Hi @<1726772411946242048:profile|CynicalBlackbird36> , what you're looking at is the metrics storage, this is referring to all the console outputs, scalars, plots and debug samples.
This is saved in the backend of ClearML. There is no direct way to pull this but you can technically fetch all of this using the API.
wdyt?
Hi @<1774245260931633152:profile|GloriousGoldfish63> , you can configure it in the volumes section of the fileserver in the docker compose.
Hi @<1523704157695905792:profile|VivaciousBadger56> , you can configure Task.init(..., output_uri=True) and this will save the models to the clearml file server
Sure, if you can post it here or send in private if you prefer it would be great
Hi SubstantialElk6 , I think you need to have Task.init() inside these sub processes as well.
Hi @<1528546301493383168:profile|ThoughtfulElephant4> , where did you upload the dataset? Can you add the full log? If your colleague clones and enqueues - the code assumes that the files are local, no?
And when you run it again under exactly the same circumstances it works fine?
Hi @<1546303293918023680:profile|MiniatureRobin9> , can you please add logs of the tasks + controller?
Hi @<1590514584836378624:profile|AmiableSeaturtle81> , not sure I understand this line
Is the order of --ids the same as returned rows?
Also, regarding the hash, I'd suggest opening a github feature request for this.
I think the issue is that the message isn't informative enough. I would suggest opening a GitHub issue on this requesting a better message. Regarding confirming - I'm not sure but this is the default behavior of Optuna. You can run a random or a grid search optimization and then you won't see those messages for sure.
What do you think?
Hi RoundMole15 , what version of clearml are you using? Also how is the model being saved without ClearML?
Hi @<1546303293918023680:profile|MiniatureRobin9> , can you please add the full log of the run? Also, do you have some code that reproduces this?
Hi @<1858681577442119680:profile|NonchalantCoral99> , please see my reply to Vojta 🙂