data:image/s3,"s3://crabby-images/3b5b5/3b5b5e6c0714497d8f2512c3d7cb1f90fae74674" alt="Profile picture"
Reputation
Badges 1
52 × Eureka!Hi @<1523701087100473344:profile|SuccessfulKoala55> thanks for your reply. Not sure where I can find more about the extra docker bash script that you mention… I would appreciate if you can point me in the right direction. Thanks.
Update
I ran:
clearml-agent build --id <task-id> --docker <custom-docker> --log-level DEBUG --entry-point reuse_task
and got a similar problem:
Host key verification failed.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
Also @<1619867994005966848:profile|HungryTurtle13> 👆
there under fileserver it should read /opt/clearml/data/fileserver
I can’t see anything under /mnt
so no fileserver there (?)
if that were the case it explains why I see /opt/clearml/data/fileserver
but no /mnt/fileserver
….
I see this in the docker-compose.yml
file:
fileserver:
networks:
- backend
- frontend
command:
- fileserver
container_name: clearml-fileserver
image: allegroai/clearml:1.12.1-397
environment:
CLEARML__fileserver__delete__allow_batch: "true"
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/data/fileserver:/mnt/fileserver
- /opt/clearml/config:/opt/clearml/config
ports:
- "8081:...
This is what I see:
Responding to my own question, in case someone else has the same issue. You have to edit the security group and enable TCP 8080.
I haven’t figure out the missing fileserver? :man-shrugging:
I am not a docker expert but am I correct to say that here the ‘/mnt/fileserver’ is the container path rather than the source path?
sorry I am a noob not sure how can do that but happy to help if I can
but from a terminal I can do:
ubuntu@***:~/sw/clearml-tutorial$ git fetch --all --recurse-submodules
Fetching origin
and it works
I just ran a dummy experiment logging images, plots, etc and I can see them in my server’s Web UI.
@<1523701205467926528:profile|AgitatedDove14> None
Do you have any insights on the missing fileserver @<1523701205467926528:profile|AgitatedDove14> ?
3fdcf5db64d allegroai/clearml:1.12.1-397 “/opt/clearml/wrappe…” 10 days ago Up 9 minutes 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp clearml-fileserver
ok so the documentation is confusing here:
@<1523701087100473344:profile|SuccessfulKoala55> I changed my agent to poetry mode it and it worked like magic. Thanks Jake!
Hi @<1523701205467926528:profile|AgitatedDove14> thanks for your reply. I am seeing this is an issue with torch 2.0.1 because it does not install the needed cuda dependencies:
Adding this info here, in case anyone here has this issue. It looks like switching to torch 2.0.0 fixes the issue. I will update here after I test that. Thanks again 🙏
Hey @<1593051292383580160:profile|SoreSparrow36> I am trying to test that if I delete a project the S3 storage gets also deleted. But I am not sure this is even a good assumption as I haven’t found anywhere what the expected/default behaviour is. Do you happen to know anything about this? Thanks.
Hei @<1523701087100473344:profile|SuccessfulKoala55> it just worked. Maybe there was some github refresh delay … not sure but thanks anyways for the debug
suggestion. 👍
hmmm… probably not if I don’t have a reference that clearml can update right?….
What about:
hpo_params = OmegaConf.to_object(cfg)
...
task.connect(hpo_params)
And then I use hpo_params
in the code. This way I give clearml a chance to update the object.
Would this work? Thanks
Hey @<1523701205467926528:profile|AgitatedDove14> in the WebUI the hydra configuration object is under CONFIGURATION OBJECTS > OmegaConf
So should this be OmegaConf/trainer.batch_size
?
Hi @<1523701205467926528:profile|AgitatedDove14> , I see _allow_omegaconf_edit_
under HYPERPARAMETERS > Hydra