Reputation
Badges 1
52 × Eureka!sorry I am a noob not sure how can do that but happy to help if I can
I can’t see anything under /mnt
so no fileserver there (?)
Hi @<1523701205467926528:profile|AgitatedDove14> thanks for your reply. I am seeing this is an issue with torch 2.0.1 because it does not install the needed cuda dependencies:
Adding this info here, in case anyone here has this issue. It looks like switching to torch 2.0.0 fixes the issue. I will update here after I test that. Thanks again 🙏
ok so the documentation is confusing here:
so if I want to refer to batch_size
in my_hydra_config.yaml
:
# dummy config file
trainer:
params:
batch_size: 32
do I pass this to the HyperParameterOptimizer
as:
Hydra/trainer/params/batch_size
??
@<1523701205467926528:profile|AgitatedDove14> 👆 ? Thanks
Hey @<1523701087100473344:profile|SuccessfulKoala55> I am not sure this is the case as the instance can checkout code in poetry/pip mode. This issue only happens if I try to run the agent in docker mode. I read in the docs that when you run the agent in docker mode the . ssh
directory of the host is copied to the container under /root/.ssh
so I have the theory that when I am building the custom docker image I don’t end up with a /root
folder (?) I haven’t had the time to debug th...
@<1523701087100473344:profile|SuccessfulKoala55> thanks so much for your reply. I can see now the source of my confusion:
After I finished deploying the server in AWS , the next step in that page is “ configuring ClearML for [ ClearM...
Thanks Martin. This is the first step out of many…
3fdcf5db64d allegroai/clearml:1.12.1-397 “/opt/clearml/wrappe…” 10 days ago Up 9 minutes 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp clearml-fileserver
so it looks like the server is there (docker ps), I can see the artifacts (web ui), but not sure where things are as per documentation there is no /mnt/fileserver
(?)
Hi @<1523701087100473344:profile|SuccessfulKoala55> thanks for your reply. Not sure where I can find more about the extra docker bash script that you mention… I would appreciate if you can point me in the right direction. Thanks.
Hi @<1523701205467926528:profile|AgitatedDove14> , I see _allow_omegaconf_edit_
under HYPERPARAMETERS > Hydra
from this video tutorial None :
“…the name of the hyperparameter consist of the section is reported to followed by a slash then its name…”
So following that confuses me because I can’t see my Hydra parameters under Hyperparameters > Hydra
and this is why I thought, ok well, perhaps use OmegaConf/params.batch_size
Is this another opportunity to improve the documentation? Happy to help if so.
so it’s not intuitive to me to try Hydra/params.batch_size
I will try it nonetheless as you suggested.
I see this in the docker-compose.yml
file:
fileserver:
networks:
- backend
- frontend
command:
- fileserver
container_name: clearml-fileserver
image: allegroai/clearml:1.12.1-397
environment:
CLEARML__fileserver__delete__allow_batch: "true"
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/data/fileserver:/mnt/fileserver
- /opt/clearml/config:/opt/clearml/config
ports:
- "8081:...
I am not a docker expert but am I correct to say that here the ‘/mnt/fileserver’ is the container path rather than the source path?
@<1523701205467926528:profile|AgitatedDove14> None
@<1523701435869433856:profile|SmugDolphin23> I had the same issue uploading a torch model. Thank you for being a life 🛟