Reputation
Badges 1
18 × Eureka!exactly. it's saved in a lightning_logs
folder where i started the script instead.
Great! Thank you
If I understand correctly that means many of the arguments in HyperParameterOptimizer become meaningless right?execution_queue
is not relevent anymoretotal_max_jobs
is determined by how many machine I launch the script
same for max_number_of_concurrent_tasks
?
Maybe to clarify, I was looking for something with the more classic Ask-and-Tell interface
https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/009_ask_and_tell.html
https://scikit-optim...
Apologies, the error on my side. It's working.
Thanks a lot!
I tested ~/clearml.conf
and CLEARML_DEFAULT_OUTPUT_URI
, they are both ignored.
Maybe for more context, I'm using https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch-lightning/pytorch_lightning_example.py to test and the only way I can have the model checkpoints uploaded to the server is if I set output_uri
in the Task.init
.
I base the assumption that I should not have to do that on the following comment from ~/clearml.conf
.
` # Defau...
sorry, they both return the same. was a typo in my test for task.models
vs task.get_models()
- in the UI it does have the proper types.
Ah ok, thanks. I was hoping to be able to set the default server-wide and not have to tell all users to do it themselves in the code.
Actually, when you say client side, it means it should work in ~/clearml.conf
no?
I'm assuming it's in a later version than 1.7.1 (the one I'm running).
But I guess for my question #1 I was doing it fine. Do you have any idea for #2?
Old bug and I should update maybe?
Quick follow-up on this topic.
Has the "compare more than 10 run" feature been added in the past 2 months? I can't find any info about this.
Alright, thanks again for the answers 🙂
I'll take a deeper look at everything you mentioned but, sadly, I doubt this would work for me.
Yeah, I think I understand. The thing I was missing is that I wanted to not use the agent and just call my code directly.
That's not possible, right?
I really need to have a dummy experiment pre-made and have the agent clone the code, set up the env and run everything?
amazing! thank you
That's a very good question.
They are visible in the UI.
And accessible in the way I mentioned above.
Any idea?
For now, I retrieve and load the model as follows (pytorch lightning)clearml_model = task.models['output'][-1] model_path = clearml_model.get_local_copy() main_loop = LightningModel.load_from_checkpoint(checkpoint_path=model_path)
Not sure how InputModel
would help and task.get_models()
is always empty.
I did a simple test outside of the pl.LightningModule
and it seems like it's not returning anything even there. I'm probably missing something obvious.
That being said it returns none for me when I reload a task but it's probably something on my side.
Great, thank you. I was wondering if it was the recommended way but seems like it is.
I did not, I assumed that Task.init
was mostly to initialize a new task and Task.get_task
was to load an existing one but it seems I was wrong.
I ended up using task = Task.init(
continue_last_task
=task_id)
to reload a specific task and it seems to work well so far.
Thanks
Thanks a lot for the quick and clear answer!
I'm in a weird constrained setup with a NSF mount for /opt/clearml where I can't change the permissions easily but thanks for your answer. I'll contact people on my side to change the permissions instead of recompiling. This one still is quite confusing to me. I did what you suggested. I also created the credential in localhost:8080/settings/workspace-configuration
and set it properly in /opt/clearml.conf
. I tested with permission root...