SourSwallow36 okay, let's assume we have the base experiment (the original one before the HP process).
What we do is we clone that experiment (either in UI or with code or with code automation, aka HP optimizer. Then each clone of the original gets a set of new HP, then we enqueue the 10 experiments into the execution queue. In parallel, we run trains-agent on a machine, and connect it to the queue. It will pull the experiments, one after the other, run them and log their results. We will end with 10 "completed" experiments.
Make sense?
For HPO (hyper-param opt), are all experiments which are part of the optimization process logged? I understand the HPO process takes a base experiment and runs subsequent experiments with the new HPs. Are these experiments logged too (with the train-valid curves, etc)?
Are these experiments logged too (with the train-valid curves, etc)?
Yes every run is log as a new experiment (with it's own set of HP). Do notice that the execution itself is done by the "trains-agent". Meaning the HP process creates experiments with new set of HP an dputs them into the execution queue, then trains-agent
pulls them from the queue and starts executing them. You can have multiple trains-agent
on as many machines as you like with specific GPUs etc. each one will pull a single experiment and execute it, once it is done it will pull the next one etc.
SourSwallow36 how are you thinking of running those HP tests?
Now will these 10 experiments be of different names? How will I know these are part of the 'mnist1' HPO case?
Yes (they will have the specific HP name/value combination).
FYI names are not unique so in theory you could have multiple experiments with the same name.
If you look under the Configuration Tab, you will find all the configuration arguments for the experiment. You can also add specific arguments to the experiment table (click the cogwheel at the right top corner, and select +hyper-parameters)
Ok, cool. Thanks. This clears up things. I need to read more about the trains agent then. I have another question, I'll post it as a separate thread.
Mostly they are a set of user defined hyper-parameters. I've been reading about hyper-param optimization since posting this. It seems like I would have to use hyper-param opt to achieve that.
how are you thinking of running those HP tests?
I'm not sure if I completely understand the question. Here is what I do presently. This maybe achieved more efficiently in trains (that's why I'm trying to move to trains).
Example:
I have a set of 10 user defined HPs. I have a scheduler that runs them independently in parallel. Once the training is complete, I run inference on the test set for these experiments. The data for both training and inference is logged under the respective experiment (which are 10 in this case).
So I'm trying to emulate this process in trains.
Yes every run is log as a new experiment (with it's own set of HP). Do notice that the execution itself is done by the "trains-agent". Meaning the HP process creates experiments with new set of HP an dputs them into the execution queue, then
trains-agent
pulls them from the queue and starts executing them. You can have multiple
trains-agent
on as many machines as you like with specific GPUs etc. each one will pull a single experiment and execute it, once it is done it will pull the next one etc.
Oh ok! So if I have the base experiment say 'mnist1' and I run HPO which executes 10 experiments. Now will these 10 experiments be of different names? How will I know these are part of the 'mnist1' HPO case?
Obviously if you click on them you will be able to compare based on specific metric / parameters (either as table or in parallel coordinates)
Hi SourSwallow36
What do you man by Log each experiment separately ? How would you differentiate between them?
Well that depends on how you think about the automation. If you are running your experiments manually (i.e. you specifically call/execute them), then at the beginning of each experiment (or function) call Task.init
and when you are done call Task.close
. This can be done in parallel if you are running them from separate processes.
If you want to automate the process, you can start using the trains-agent
which could help you spin those experiments on as many machines as you like 🙂
This is an example of hoe one can clone an experiment and change it from code:
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
A full HPO optimization process (basically the same idea only with optimization algorithms deciding on the next set of parameters) is also available:
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py