What do you mean by "pull and report multiple trials" ? Spawn multiple processes with different parameters ?
Lets say you are doing bayesian sampling of some parameter with your optimizer, that means the next sample will be a function of previous samples. And all of this is contained in the optimizer state (in the optuna optimizer case in the study object). So to have an option to run some optimization in the way described in the example the communication with the optimizer task should have a synced state of the optimizer.
Pull : accessing a sample from the optimizer ( a point int the hyper plane) in an exclusive way (other machines won't run it again)
Report : push the result in such a way that it would be registered for the bayesian sampling for example
Multiple Trials : The same python script runs more then one without restarting
in terms of the bottleneck considerations, the ClearML agent setup is relatively small portion of the run initialization, we have some other parts, and for some cases we get initialization time can be about 10 times the experiment time
so scaling this overhead cost we are effectively losing (10 x #machines)X in performance for some HPO studies we are running