@<1523701070390366208:profile|CostlyOstrich36> Will it work? Assume I have 3 workers.
1 takes a task
2 takes a task
1 checks last_worker
Hi @<1533619716533260288:profile|SmallPigeon24> , to your questions :
- Yes. It's under 'worker name'
- You can set worker name in
clearml.conf
withagent.worker_name
@<1533619716533260288:profile|SmallPigeon24> I assume by "board" you mean "queue"?
@<1523701087100473344:profile|SuccessfulKoala55> No, I mean a chip. A piece of hardware that cannot, on it's on, run an agent and as such an attached computer - in this case, the server - will have an agent accessing it via ssh. In my case, I want to have a "board farm" - multiple boards for running inferences on them, and I'd like to have them all connected to the same server.
@<1523701070390366208:profile|CostlyOstrich36> Hi!
Thank you very much for the informative answer.
I have a follow-up question on q.1: Is there a pythonic way to retrieve that info mid-run?
@<1533619716533260288:profile|SmallPigeon24> you can use the task's last_worker
property
@<1523701087100473344:profile|SuccessfulKoala55> I disagree. Queues can have multiple workers, and that implies multiple instances of a task can run concurrently.
This is necessary for board farms, or any non-tiny scale of work.
@<1533619716533260288:profile|SmallPigeon24> the intent behind queues supporting multiple workers is to have many consumers for that queue ("producer") - it does not mean multiple workers can pull the same task from the queue
Queues can have multiple workers, and that implies multiple instances of a task can run concurrently.
@<1533619716533260288:profile|SmallPigeon24> as long as these are the Exact same instances you can have them runing simultaneously (think multi node training), that said each one should "know" not to report over the others, because of course it will overwrite the reports.
Back to your point on multiple agents:
You cannot have two Tasks in the same queue, that means that a single agent pulls a task that agent needs to "spin it" multiple times on the "remote boards"
Another option is creating multiple Tasks (i.e. cloning), then use the user-properties, or hyper parameters, to tell the Code where to report to (of course this implies manually calling the Logger class, because the auto logging is going directly to the main task)
To me this seems the most logical, as it allows to debug the individual executions as well as have a combined view of the metrics.
Lastly you can always use the comparison feature to have all the individual metrics on the same graph
wdyt ?
@<1533619716533260288:profile|SmallPigeon24> why would two workers take the same task? You can only have one agent running a task, so you will always get the last_worker representing the agent running the task