
Reputation
Badges 1
46 × Eureka!(do you welcome PRs?)
Dang, so unlike screenshots, reports do not survive task deletion :/
OK, so no way to have an automatic dispatch to different, correctly-sized instances, it’s only achievable by submitting to different queues?
Can the “multiple agents on a single queue” scenario, combined with the autoscaler, spawn multiple agents on a single EC2 instance, by chance, please? (thinking e.g. 8 agents on a 8xGPU machine)
Thanks @<1523701070390366208:profile|CostlyOstrich36> ! I'll do - and might even peek under the hood see if I can make a PR. What's the best repo for that? Is it that of the ClearML python package?
@<1523701205467926528:profile|AgitatedDove14> great! (I'm on the Pro version :) ).
@<1523701087100473344:profile|SuccessfulKoala55> I think you’ve been tagged in the PR 🙂
Yes, exactly. Here is the logical sense it makes: I have plots where iterations represent different units: for some these plots iterations (call them A) are optimization steps, while for others (call them B) they are evaluation iterations, occuring every N optimization steps. I would like to either:
- Change the X label so these different plots do not have the same label when they represent different things.
- Or, even better, keep the unique "iterations" label but be able to change how I lo...
What is the best way to achieve that please?
(actually, that might even be feasible without touching the UI, depending how the plot is rendered, but I'll check)
Happy to jump on a call if easier to make sense of it :)
From the doc I seemed to find ways to log 2D scatter plots, but not line plots :/ (found)
It also seems simpler to keep the scalar logging structure, but be able to pass a multiplier (reflecting the eval_n_steps
in for example Torch Lightning)
The problem with logging as a 2D plot is we lose the streaming: if I understand correctly the documentation, Logger.current_logger().report_scatter2d
logs a single, frozen 2D plot when you know the full X and Y data. And you would do that at each evaluation step.
Logging scalars allows to log a growing time series, i.e. add to the existing series/plot at every "iteration", thus being able to monitor the progress over time in one single plot. It's a much more logical setting.
Logging scalars also leverages ClearML automatic logging. One problem is that this automatic logging seems to keep its own internal "iteration" counter for each scalar, as opposed to keeping track of, say, the optimizer's number of steps.
That can be simply fixed on clearML python lib by allowing to set a per-scalar iteration-multiplier.
Tagging my colleague @<1529271085315395584:profile|AmusedCat74> who made that report.
@<1523701070390366208:profile|CostlyOstrich36> Any idea please? We could use our 8xA100 as 8 workers, for 8 single-gpu jobs running faster than on a single 1xV100 each.
Thanks @<1523701070390366208:profile|CostlyOstrich36> !
- I hadn’t found the multiple-resources within the same autoscaler. Could you point me to the right place please? Are they all used interexchangeably based upon availability, rather than based on job needs?
- We thought of using separate queues (we do that for CPU vs GPU queues), but having ClearML automatically dispatch to the right based on a job specification would be more flexible. (for example, we could then think to dispath dynami...
Great, thanks both! I suspect this might need an extra option to be passed via the SDK, to save the iteration scaling at logging time, which the UI can then use at rendering time.
Hi 🙂 Anyone having any idea on that one please? Or could point me in the right place or the right person to find out? Thanks for any help!
Brilliant, thanks a lot for the answer Jake, much appreciated and clearer!
@<1529271085315395584:profile|AmusedCat74> @<1548115177340145664:profile|HungryHorse70> here we have the answer :)
Is the doc on GitHub so we can copy that into a PR?
@<1523701087100473344:profile|SuccessfulKoala55> yes I am 🙂 And thanks, looking forward to it!
Tagging my colleague @<1529271085315395584:profile|AmusedCat74> who needs this with me 🙂
Do Pipelines work with Hyperparameter search, and with single training jobs?
And yes, I was also referring to tasks ran by the Autoscaler (potentially via the HPO) app, too.
It was a debugging session. We haven’t yet tried a “Standard” non-debugging clearml session.
Tagging @<1529271085315395584:profile|AmusedCat74> my colleague with whom we ran into this issue.