Hi TenseOstrich47 ,
ClearML Agent is meant to be installed on the remote machine.
The experiment is usually initialized and defined by the experiment code, running locally on your machine - once you run this experiment (call it your "template", and you don't have to run it through, just let it start running and do an iteration or two 🙂 ), it appears in the WebApp, where you can clone it (which creates a new draft experiment, identical to the template experiment), and enqueue the new clone experiment in a queue . ClearML Agent should run on a remote machine and listen on this queue (and possibly other queues as well). Once the experiment is queued, ClearML Agent will take it from the queue, initialize a suitable running environment for this experiment on the remote machine and execute it.
This is the process in general (without all kinds of small details and various options 🙂 )
Does that answer your question?
Hi TenseOstrich47
You can also check this video out on our youtube channel:
https://youtu.be/gPBuqYx_c6k
It's still branded as trains (our old brand) but it applies to clearml just the same!
Yes it does 🙂 I suspected this was the process. Thanks Jake. One last question, more so about the architecture design - is it advised to have the clearml server instance and a 'worker' instance listening to the queue as separate remote machines, or can I use the same instance for the web UI and and as a worker? I understand that processing pipelines may be compute intense enough to consume all resources and break the web UI, but I was wondering whether using a single large instance is a possibility at all?
Thanks AnxiousSeal95 , will check it out! 🙂
Well, it's certainly possible, although I haven't tried it myself. Usually, your experiments require a GPU, and that's not something you need in the machine running the Server.
To sum up, it's possible, but in reality if you have a semi-decent non-GPU machine with 16GB of memory, you can just run the server there and forget about it 🙂
Awesome, thank you Jake! very helpful. For a lot of the models we run, we do not require GPU resources, so its good to know that a beefy instance should be able to run the experiments.