Reputation
Badges 1
92 × Eureka!hey, I test it, it looks it works, still it takes much time (mainly in the second run of the code, it part of my eval process)
Ohh I understood, so can you give me a short explanation on how to change the meta data?
the index creation:[ec2-user@ip-172-31-26-41 ~]$ sudo docker exec -it clearml-mongo /bin/bash root@3fc365193ed0:/# mongo MongoDB shell version v3.6.5 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.6.5 Welcome to the MongoDB shell. For interactive help, type "help". For more comprehensive documentation, see Questions? Try the support group `
Server has startup warnings:
2021-01-25T05:58:37.309+0000 I CONTROL [initandlisten]
2021-01-25T05:58:37.309+0000 I C...
Thanks I just want to avoid giving the credentials to every user.
If it won't possible, I will do it..
I reproduced the stuck with this code..
But for now only with my env , when I tried to create new env only with the packages that this code needed it wont stuck.
So maybe the problem is conflict between packages?
Thanks!! you are the best..
I will give it a try when the runs will finish
Hi CumbersomeCormorant74 ,
This is a server we installed.
The server version is: 0.17
We checked with Chrome, and FireFox
Thanks, ophir
If I will mount the S3 bucket to the trains-server and link the mount to /opt/trains/data/fileserver does it will work?
how long? 😅
I am now stuck inCopying index events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b
for more then 40 min 😥
Does it possible to know in advance where the Agent will clone the code?
Or running a link command just before the execution of the code?
It is now stacking after:
` 2021-03-09 14:54:07
task 609a976a889748d6a6e4baf360ef93b4 pulled from 8e47f5b0694e426e814f0855186f560e by worker ov-01:gpu1
2021-03-09 14:54:08
running Task 609a976a889748d6a6e4baf360ef93b4 inside default docker image: MyDockerImage:v0
2021-03-09 14:54:08
Executing: ['docker', 'run', '-t', '--gpus', '"device=1"', '-e', 'CLEARML_WORKER_ID=ov-01:gpu1', '-e', 'CLEARML_DOCKER_IMAGE=MyDockerImage:v0', '-v', '/tmp/.clearml_agent.jvxowhq4.cfg:/root/clearml.conf', '-v', '/...
So for now I am leaving this issue...
Thanks a lot 🙏 🙌
I have one computer with 4 GPUs and like to create a queue over the gpus..
For now the project runs without queue.
My configs holds the relative paths to the data (and it can take time to change all of them) so I prefer to work in relative paths if it possible..
OK thanks for the answer.. I will usetask.set_resource_monitor_iteration_timeout(seconds_from_start=1800)as you suggested for now..
If you will add something like I suggest can you notify me?
Yes this is what we are doing 👍
I did it just because FAIR did it in detectron2 Dockerfile
SuccessfulKoala55 it still stuck on the same line .. does it should be like this?
Hi SuccessfulKoala55 , yes for now I will like to start moving what inside the /opt/trains/data/fileserver..
because as I understand the logs and graphs are saved in elastic so I think it will not be easy to move them as well right?
OHH nice, I thought that it just some kind of job queue on up and running machines
SuccessfulKoala55 and AppetizingMouse58 Thanks you very much!!
I have a future question:
Does this fix should harm in future cleraml-server upgrade?
Or what the best practice to upgrade after doing it?
someone in my company started a training 😥 , will do it after it will finish.. and will update
Thanks you are the best 🙏
Thanks I am basing my docker on https://github.com/facebookresearch/detectron2/blob/master/docker/Dockerfile