
Reputation
Badges 1
52 × Eureka!so it’s not intuitive to me to try Hydra/params.batch_size
I will try it nonetheless as you suggested.
@<1523701205467926528:profile|AgitatedDove14> Got the overrides working with Hydra/params.batch_size
thank you 🙏
so if I want to refer to batch_size
in my_hydra_config.yaml
:
# dummy config file
trainer:
params:
batch_size: 32
do I pass this to the HyperParameterOptimizer
as:
Hydra/trainer/params/batch_size
??
@<1523701205467926528:profile|AgitatedDove14> 👆 ? Thanks
Will this work?
task.connect(OmegaConf.to_object(cfg))
assuming cfg
is my Hydra dict
Thanks @<1523701205467926528:profile|AgitatedDove14> happy to PR on the docs 😉
Hey @<1523701205467926528:profile|AgitatedDove14> in the WebUI the hydra configuration object is under CONFIGURATION OBJECTS > OmegaConf
So should this be OmegaConf/trainer.batch_size
?
Hi @<1523701205467926528:profile|AgitatedDove14> , I see _allow_omegaconf_edit_
under HYPERPARAMETERS > Hydra
Hi @<1523701087100473344:profile|SuccessfulKoala55> thanks for your response. What I mean is that in the Web UI when you are creating a project you have storage (S3) field at the bottom of the create project pop-up, where you enter the S3 bucket that you want to associate with the project. Now, the thing is, you can’t visualize that information after the project is created, anywhere in the UI, as far as I can tell. So, it would be great to be able to see the configured bucket somewhere in...
hmmm… probably not if I don’t have a reference that clearml can update right?….
What about:
hpo_params = OmegaConf.to_object(cfg)
...
task.connect(hpo_params)
And then I use hpo_params
in the code. This way I give clearml a chance to update the object.
Would this work? Thanks
Do you have any insights on the missing fileserver @<1523701205467926528:profile|AgitatedDove14> ?
hmm….. probably simpler/cleaner if I do
hpo_params = {
'param1':cfg.param_1, ...
}
task.connect(hpo_params)
Thoughts?
from this video tutorial None :
“…the name of the hyperparameter consist of the section is reported to followed by a slash then its name…”
So following that confuses me because I can’t see my Hydra parameters under Hyperparameters > Hydra
and this is why I thought, ok well, perhaps use OmegaConf/params.batch_size
Is this another opportunity to improve the documentation? Happy to help if so.
I see this in the docker-compose.yml
file:
fileserver:
networks:
- backend
- frontend
command:
- fileserver
container_name: clearml-fileserver
image: allegroai/clearml:1.12.1-397
environment:
CLEARML__fileserver__delete__allow_batch: "true"
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/data/fileserver:/mnt/fileserver
- /opt/clearml/config:/opt/clearml/config
ports:
- "8081:...
Hey @<1523701087100473344:profile|SuccessfulKoala55> I am not sure this is the case as the instance can checkout code in poetry/pip mode. This issue only happens if I try to run the agent in docker mode. I read in the docs that when you run the agent in docker mode the . ssh
directory of the host is copied to the container under /root/.ssh
so I have the theory that when I am building the custom docker image I don’t end up with a /root
folder (?) I haven’t had the time to debug th...
Hi @<1523701087100473344:profile|SuccessfulKoala55> thanks for your reply. Not sure where I can find more about the extra docker bash script that you mention… I would appreciate if you can point me in the right direction. Thanks.
Also @<1619867994005966848:profile|HungryTurtle13> 👆
This is what I see:
Hi @<1523701087100473344:profile|SuccessfulKoala55> it’s failing again.. I haven’t rebooted the agent or changed anything and I am able to connect with ssh with ssh -vT
git@github.com on a different tmux sess.
This is the error I am seeing running the agent with the -debug
flag:
Using cached repository in "/home/ubuntu/.clearml/vcs-cache/clearml-tutorial.git.e1c2351b09f3d661b6f0dbf85e92be2e/clearml-tutorial.git"
git@github.com: Permission denied (pub...
Thanks Martin. This is the first step out of many…
so it looks like the server is there (docker ps), I can see the artifacts (web ui), but not sure where things are as per documentation there is no /mnt/fileserver
(?)
but from a terminal I can do:
ubuntu@***:~/sw/clearml-tutorial$ git fetch --all --recurse-submodules
Fetching origin
and it works
Hei @<1523701087100473344:profile|SuccessfulKoala55> it just worked. Maybe there was some github refresh delay … not sure but thanks anyways for the debug
suggestion. 👍
I just ran a dummy experiment logging images, plots, etc and I can see them in my server’s Web UI.
sorry I am a noob not sure how can do that but happy to help if I can
I can’t see anything under /mnt
so no fileserver there (?)
Hi @<1523701205467926528:profile|AgitatedDove14> thanks for your reply. I am seeing this is an issue with torch 2.0.1 because it does not install the needed cuda dependencies:
Adding this info here, in case anyone here has this issue. It looks like switching to torch 2.0.0 fixes the issue. I will update here after I test that. Thanks again 🙏