Reputation
Badges 1
52 × Eureka!@<1523701087100473344:profile|SuccessfulKoala55> I changed my agent to poetry mode it and it worked like magic. Thanks Jake!
Do you have any insights on the missing fileserver @<1523701205467926528:profile|AgitatedDove14> ?
A related question… how does the server know how to delete artifacts when the project is deleted if it doesn’t have a clearml.conf with the S3 credentials to do so?
@<1523701205467926528:profile|AgitatedDove14> None
I just ran a dummy experiment logging images, plots, etc and I can see them in my server’s Web UI.
@<1523701087100473344:profile|SuccessfulKoala55> thanks so much for your reply. I can see now the source of my confusion:
After I finished deploying the server in AWS , the next step in that page is “ configuring ClearML for [ ClearM...
What I am referring to is this information about the Storage Configuration:
None
if that were the case it explains why I see /opt/clearml/data/fileserver but no /mnt/fileserver ….
but from a terminal I can do:
ubuntu@***:~/sw/clearml-tutorial$ git fetch --all --recurse-submodules
Fetching origin
and it works
Hey @<1523701205467926528:profile|AgitatedDove14> in the WebUI the hydra configuration object is under CONFIGURATION OBJECTS > OmegaConf
So should this be OmegaConf/trainer.batch_size ?
from this video tutorial None :
“…the name of the hyperparameter consist of the section is reported to followed by a slash then its name…”
So following that confuses me because I can’t see my Hydra parameters under Hyperparameters > Hydra
and this is why I thought, ok well, perhaps use OmegaConf/params.batch_size
Is this another opportunity to improve the documentation? Happy to help if so.
@<1523701435869433856:profile|SmugDolphin23> I had the same issue uploading a torch model. Thank you for being a life 🛟
Update
I ran:
clearml-agent build --id <task-id> --docker <custom-docker> --log-level DEBUG --entry-point reuse_task
and got a similar problem:
Host key verification failed.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
Also @<1619867994005966848:profile|HungryTurtle13> 👆
I am not a docker expert but am I correct to say that here the ‘/mnt/fileserver’ is the container path rather than the source path?
so it looks like the server is there (docker ps), I can see the artifacts (web ui), but not sure where things are as per documentation there is no /mnt/fileserver (?)
so it’s not intuitive to me to try Hydra/params.batch_size I will try it nonetheless as you suggested.
sorry I am a noob not sure how can do that but happy to help if I can
Hi @<1523701435869433856:profile|SmugDolphin23> thanks for your answer. I am not sure that I understand. I ran a test by cloning and experiment and editing the OmegaConf object under Configuration > Hyperparameters > OmegaConf.
Unless I also change the allow_omegaconf_edit flag to True , I won’t see my changes reflected. That is my question. As a new user, it seems counterintuitive that I have to also change the flag. Does this make sense to you? Thanks.
hmmm… probably not if I don’t have a reference that clearml can update right?….
What about:
hpo_params = OmegaConf.to_object(cfg)
...
task.connect(hpo_params)
And then I use hpo_params in the code. This way I give clearml a chance to update the object.
Would this work? Thanks
Hi @<1523701087100473344:profile|SuccessfulKoala55> it’s failing again.. I haven’t rebooted the agent or changed anything and I am able to connect with ssh with ssh -vT git@github.com on a different tmux sess.
This is the error I am seeing running the agent with the -debug flag:
Using cached repository in "/home/ubuntu/.clearml/vcs-cache/clearml-tutorial.git.e1c2351b09f3d661b6f0dbf85e92be2e/clearml-tutorial.git"
git@github.com: Permission denied (pub...
Thanks @<1523701205467926528:profile|AgitatedDove14> reading …
