BTW
/home/local/user/.clearml/venvs-builds/3.7/bin/python: can't open file 'train.py': [Errno 2] No such file or directory
This error is from the agent, correct? it seems it did not clone the correct code, is train.py
committed to the repository ?
Hmm ConvincingSwan15
WARNING - Could not find requested hyper-parameters ['Args/patch_size', 'Args/nb_conv', 'Args/nb_fmaps', 'Args/epochs'] on base task
Is this correct ? Can you see these arguments on the original Task in the UI (i.e. Args section, parameter epochs?)
Hi ConvincingSwan15
For the train.py do I need a setup.py file in my repo to work corerctly with the agent ? For now it is just the path to train,py
I'm assuming the train.py is part of the repository, no?
If it is, how come the agent after cloning the repository cannot find it ?
Could it be it was accidentally not added to the git repo ?
Okay, so I think it doesn't find the correct Task, otherwise it wouldn't print the warning,
How do you setup the HPO class ? Could you copy paste the code?
Ok so I installed the last version of clearml and the hyperparameters are found now
It was run with the exact same version. And I got the same message with "epochs" only.
Hi AgitatedDove14
The code is on a private repo (clearml-agent is configure with ssh key and get the code correctly) Otherwise I run the code directly on my computer. The code was previously ran in a task and the task seems to be correctly loaded. I get the right id from the get_task
function.
When the optimizer try to run the first batch of hyperparameter I get this error message in the log /home/local/user/.clearml/venvs-builds/3.7/bin/python: can't open file 'train.py': [Errno 2] No such file or directory
Hi ConvincingSwan15
A few background questions:
Where is the code that we want to optimize? Do you already have a Task of that code executed?
"find my learning script"
Could you elaborate ? is this connect to the first question ?
an_optimizer = HyperParameterOptimizer( base_task_id="6f3bf2ecbb964ff3b2a6111c34cb0fa3", hyper_parameters=[ DiscreteParameterRange('Args/patch_size', values=[32, 64, 128]), DiscreteParameterRange('Args/nb_conv', values=[2, 3, 4]), DiscreteParameterRange('Args/nb_fmaps', values=[30, 35, 40]), DiscreteParameterRange('Args/epochs', values=[30]), ], objective_metric_title='valid_average_dice_epoch', objective_metric_series='valid_average_dice_epoch', objective_metric_sign='max', max_number_of_concurrent_tasks=1, optimizer_class=GridSearch, execution_queue="default", spawn_project=None, save_top_k_tasks_only=None, pool_period_min=0.2, total_max_jobs=1, min_iteration_per_job=10, max_iteration_per_job=30, )
Hmm, maybe the original Task was executed with older versions? (before the section names were introduced)
Let's try:DiscreteParameterRange('epochs', values=[30]),
Does that gives a warning ?
For the train.py do I need a setup.py file in my repo to work corerctly with the agent ? For now it is just the path to train,py
The only thing to patch is the train.py issue
Yes and I double check in python and I get the dictionnary with: Args/...
I double check the id and it is the right one
The log of my optimizer looks like this:Task: {'template_task_id': '6f3bf2ecbb964ff3b2a6111c34cb0fa3', 'run_as_service': False} 2021-03-30 10:45:25,413 - trains.automation.optimization - WARNING - Could not find requested hyper-parameters ['Args/patch_size', 'Args/nb_conv', 'Args/nb_fmaps', 'Args/epochs'] on base task 6f3bf2ecbb964ff3b2a6111c34cb0fa3 2021-03-30 10:45:25,433 - trains.automation.optimization - WARNING - Could not find requested metric ('dice', 'dice') report on base task 6f3bf2ecbb964ff3b2a6111c34cb0fa3 Progress report #0 completed, sleeping for 0.25 minutes 2021-03-30 10:45:25,639 - trains.automation.optimization - INFO - Creating new Task: {'Args/patch_size': 32, 'Args/nb_conv': 2, 'Args/nb_fmaps': 30, 'Args/epochs': 30}