Reputation
Badges 1
52 × Eureka!Hey @<1593051292383580160:profile|SoreSparrow36> I am trying to test that if I delete a project the S3 storage gets also deleted. But I am not sure this is even a good assumption as I haven’t found anywhere what the expected/default behaviour is. Do you happen to know anything about this? Thanks.
@<1547028031053238272:profile|MassiveGoldfish6> check this:
- does your local
clearml.confshould useuse_credentials_chain:true? - Do you have the needed AWS credentials in your local environment?
- Do you have an S3 bucket as the storage for your project (did you set this up when you created the project)?
- Do your local AWS credentials give you write access to that S3 bucket?
Hi @<1523701205467926528:profile|AgitatedDove14> , I see _allow_omegaconf_edit_ under HYPERPARAMETERS > Hydra
so if I want to refer to batch_size in my_hydra_config.yaml :
# dummy config file
trainer:
params:
batch_size: 32
do I pass this to the HyperParameterOptimizer as:
Hydra/trainer/params/batch_size ??
@<1523701205467926528:profile|AgitatedDove14> 👆 ? Thanks
This is what I see:
Will this work?
task.connect(OmegaConf.to_object(cfg))
assuming cfg is my Hydra dict
@<1523701205467926528:profile|AgitatedDove14> Got the overrides working with Hydra/params.batch_size thank you 🙏
Responding to my own question, in case someone else has the same issue. You have to edit the security group and enable TCP 8080.
I haven’t figure out the missing fileserver? :man-shrugging:
so it looks like the server is there (docker ps), I can see the artifacts (web ui), but not sure where things are as per documentation there is no /mnt/fileserver (?)
ok so the documentation is confusing here:
Hi @<1523701087100473344:profile|SuccessfulKoala55> thanks for your response. What I mean is that in the Web UI when you are creating a project you have storage (S3) field at the bottom of the create project pop-up, where you enter the S3 bucket that you want to associate with the project. Now, the thing is, you can’t visualize that information after the project is created, anywhere in the UI, as far as I can tell. So, it would be great to be able to see the configured bucket somewhere in...
Thanks Martin. This is the first step out of many…
Hei @<1523701087100473344:profile|SuccessfulKoala55> it just worked. Maybe there was some github refresh delay … not sure but thanks anyways for the debug suggestion. 👍
I can’t see anything under /mnt so no fileserver there (?)
Hey @<1523701087100473344:profile|SuccessfulKoala55> just updating you here. I started from scratch, new EC2 instance, follow the installation step by step and the only change that I made was selecting rsa instead of ed255190 for the generation of the SSH key (as per github docs ), and now I my agent can connect consistently to GitHub. Just thought of p...
Hey @<1523701087100473344:profile|SuccessfulKoala55> I am not sure this is the case as the instance can checkout code in poetry/pip mode. This issue only happens if I try to run the agent in docker mode. I read in the docs that when you run the agent in docker mode the . ssh directory of the host is copied to the container under /root/.ssh so I have the theory that when I am building the custom docker image I don’t end up with a /root folder (?) I haven’t had the time to debug th...
3fdcf5db64d allegroai/clearml:1.12.1-397 “/opt/clearml/wrappe…” 10 days ago Up 9 minutes 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp clearml-fileserver
Also @<1619867994005966848:profile|HungryTurtle13> 👆
Thanks @<1523701205467926528:profile|AgitatedDove14> happy to PR on the docs 😉
Hi @<1523701087100473344:profile|SuccessfulKoala55> thanks for your reply. Not sure where I can find more about the extra docker bash script that you mention… I would appreciate if you can point me in the right direction. Thanks.
Hi @<1523701205467926528:profile|AgitatedDove14> thanks for your reply. I am seeing this is an issue with torch 2.0.1 because it does not install the needed cuda dependencies:
Adding this info here, in case anyone here has this issue. It looks like switching to torch 2.0.0 fixes the issue. I will update here after I test that. Thanks again 🙏
Update
I ran:
clearml-agent build --id <task-id> --docker <custom-docker> --log-level DEBUG --entry-point reuse_task
and got a similar problem:
Host key verification failed.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin

