Reputation
Badges 1
52 × Eureka!Hei @<1523701087100473344:profile|SuccessfulKoala55> it just worked. Maybe there was some github refresh delay … not sure but thanks anyways for the debug
suggestion. 👍
Hey @<1523701087100473344:profile|SuccessfulKoala55> just updating you here. I started from scratch, new EC2 instance, follow the installation step by step and the only change that I made was selecting rsa
instead of ed255190
for the generation of the SSH key (as per github docs ), and now I my agent can connect consistently to GitHub. Just thought of p...
What I am referring to is this information about the Storage Configuration:
None
Hi @<1523701087100473344:profile|SuccessfulKoala55> it’s failing again.. I haven’t rebooted the agent or changed anything and I am able to connect with ssh with ssh -vT
git@github.com on a different tmux sess.
This is the error I am seeing running the agent with the -debug
flag:
Using cached repository in "/home/ubuntu/.clearml/vcs-cache/clearml-tutorial.git.e1c2351b09f3d661b6f0dbf85e92be2e/clearml-tutorial.git"
git@github.com: Permission denied (pub...
but from a terminal I can do:
ubuntu@***:~/sw/clearml-tutorial$ git fetch --all --recurse-submodules
Fetching origin
and it works
there under fileserver it should read /opt/clearml/data/fileserver
Hi @<1523701087100473344:profile|SuccessfulKoala55> thanks for your reply. Not sure where I can find more about the extra docker bash script that you mention… I would appreciate if you can point me in the right direction. Thanks.
Responding to my own question, in case someone else has the same issue. You have to edit the security group and enable TCP 8080.
I haven’t figure out the missing fileserver? :man-shrugging:
Thanks @<1523701205467926528:profile|AgitatedDove14> reading …
Also @<1619867994005966848:profile|HungryTurtle13> 👆
This is what I see:
Hi @<1523701205467926528:profile|AgitatedDove14> thanks for your reply. I am seeing this is an issue with torch 2.0.1 because it does not install the needed cuda dependencies:
Adding this info here, in case anyone here has this issue. It looks like switching to torch 2.0.0 fixes the issue. I will update here after I test that. Thanks again 🙏
hmmm… probably not if I don’t have a reference that clearml can update right?….
What about:
hpo_params = OmegaConf.to_object(cfg)
...
task.connect(hpo_params)
And then I use hpo_params
in the code. This way I give clearml a chance to update the object.
Would this work? Thanks
Hi @<1523701205467926528:profile|AgitatedDove14> , I see _allow_omegaconf_edit_
under HYPERPARAMETERS > Hydra
Hi @<1523701435869433856:profile|SmugDolphin23> thanks for your answer. I am not sure that I understand. I ran a test by cloning and experiment and editing the OmegaConf object under Configuration > Hyperparameters > OmegaConf.
Unless I also change the allow_omegaconf_edit
flag to True
, I won’t see my changes reflected. That is my question. As a new user, it seems counterintuitive that I have to also change the flag. Does this make sense to you? Thanks.
sorry I am a noob not sure how can do that but happy to help if I can
Update
I ran:
clearml-agent build --id <task-id> --docker <custom-docker> --log-level DEBUG --entry-point reuse_task
and got a similar problem:
Host key verification failed.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
Will this work?
task.connect(OmegaConf.to_object(cfg))
assuming cfg
is my Hydra dict
@<1547028031053238272:profile|MassiveGoldfish6> check this:
- does your local
clearml.conf
should useuse_credentials_chain:true
? - Do you have the needed AWS credentials in your local environment?
- Do you have an S3 bucket as the storage for your project (did you set this up when you created the project)?
- Do your local AWS credentials give you write access to that S3 bucket?
Do you have any insights on the missing fileserver @<1523701205467926528:profile|AgitatedDove14> ?
I can’t see anything under /mnt
so no fileserver there (?)
Hey @<1523701205467926528:profile|AgitatedDove14> in the WebUI the hydra configuration object is under CONFIGURATION OBJECTS > OmegaConf
So should this be OmegaConf/trainer.batch_size
?
Hi @<1523701087100473344:profile|SuccessfulKoala55> thanks for your response. What I mean is that in the Web UI when you are creating a project you have storage (S3) field at the bottom of the create project pop-up, where you enter the S3 bucket that you want to associate with the project. Now, the thing is, you can’t visualize that information after the project is created, anywhere in the UI, as far as I can tell. So, it would be great to be able to see the configured bucket somewhere in...
@<1523701087100473344:profile|SuccessfulKoala55> I changed my agent to poetry mode it and it worked like magic. Thanks Jake!