Reputation
Badges 1
52 × Eureka!Hey @<1523701205467926528:profile|AgitatedDove14> in the WebUI the hydra configuration object is under CONFIGURATION OBJECTS > OmegaConf
So should this be OmegaConf/trainer.batch_size
?
Hi @<1523701205467926528:profile|AgitatedDove14> , I see _allow_omegaconf_edit_
under HYPERPARAMETERS > Hydra
@<1523701205467926528:profile|AgitatedDove14> Got the overrides working with Hydra/params.batch_size
thank you 🙏
so it’s not intuitive to me to try Hydra/params.batch_size
I will try it nonetheless as you suggested.
@<1523701087100473344:profile|SuccessfulKoala55> thanks so much for your reply. I can see now the source of my confusion:
After I finished deploying the server in AWS , the next step in that page is “ configuring ClearML for [ ClearM...
@<1523701205467926528:profile|AgitatedDove14> None
Hi @<1523701205467926528:profile|AgitatedDove14> thanks for your reply. I am seeing this is an issue with torch 2.0.1 because it does not install the needed cuda dependencies:
Adding this info here, in case anyone here has this issue. It looks like switching to torch 2.0.0 fixes the issue. I will update here after I test that. Thanks again 🙏
there under fileserver it should read /opt/clearml/data/fileserver
so if I want to refer to batch_size
in my_hydra_config.yaml
:
# dummy config file
trainer:
params:
batch_size: 32
do I pass this to the HyperParameterOptimizer
as:
Hydra/trainer/params/batch_size
??
@<1523701205467926528:profile|AgitatedDove14> 👆 ? Thanks
sorry I am a noob not sure how can do that but happy to help if I can
Hi @<1523701087100473344:profile|SuccessfulKoala55> thanks for your reply. Not sure where I can find more about the extra docker bash script that you mention… I would appreciate if you can point me in the right direction. Thanks.
Hey @<1523701087100473344:profile|SuccessfulKoala55> I am not sure this is the case as the instance can checkout code in poetry/pip mode. This issue only happens if I try to run the agent in docker mode. I read in the docs that when you run the agent in docker mode the . ssh
directory of the host is copied to the container under /root/.ssh
so I have the theory that when I am building the custom docker image I don’t end up with a /root
folder (?) I haven’t had the time to debug th...
Will this work?
task.connect(OmegaConf.to_object(cfg))
assuming cfg
is my Hydra dict
hmmm… probably not if I don’t have a reference that clearml can update right?….
What about:
hpo_params = OmegaConf.to_object(cfg)
...
task.connect(hpo_params)
And then I use hpo_params
in the code. This way I give clearml a chance to update the object.
Would this work? Thanks
Update
I ran:
clearml-agent build --id <task-id> --docker <custom-docker> --log-level DEBUG --entry-point reuse_task
and got a similar problem:
Host key verification failed.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
Also @<1619867994005966848:profile|HungryTurtle13> 👆
@<1523701435869433856:profile|SmugDolphin23> I had the same issue uploading a torch model. Thank you for being a life 🛟
Hey @<1593051292383580160:profile|SoreSparrow36> I am trying to test that if I delete a project the S3 storage gets also deleted. But I am not sure this is even a good assumption as I haven’t found anywhere what the expected/default behaviour is. Do you happen to know anything about this? Thanks.
A related question… how does the server know how to delete artifacts when the project is deleted if it doesn’t have a clearml.conf
with the S3 credentials to do so?
@<1547028031053238272:profile|MassiveGoldfish6> check this:
- does your local
clearml.conf
should useuse_credentials_chain:true
? - Do you have the needed AWS credentials in your local environment?
- Do you have an S3 bucket as the storage for your project (did you set this up when you created the project)?
- Do your local AWS credentials give you write access to that S3 bucket?
I am not a docker expert but am I correct to say that here the ‘/mnt/fileserver’ is the container path rather than the source path?
if that were the case it explains why I see /opt/clearml/data/fileserver
but no /mnt/fileserver
….
I just ran a dummy experiment logging images, plots, etc and I can see them in my server’s Web UI.
I can’t see anything under /mnt
so no fileserver there (?)
ok so the documentation is confusing here:
3fdcf5db64d allegroai/clearml:1.12.1-397 “/opt/clearml/wrappe…” 10 days ago Up 9 minutes 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp clearml-fileserver