Reputation
Badges 1
91 × Eureka!the index creation:[ec2-user@ip-172-31-26-41 ~]$ sudo docker exec -it clearml-mongo /bin/bash root@3fc365193ed0:/# mongo MongoDB shell version v3.6.5 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.6.5 Welcome to the MongoDB shell. For interactive help, type "help". For more comprehensive documentation, see
Questions? Try the support group
`
Server has startup warnings:
2021-01-25T05:58:37.309+0000 I CONTROL [initandlisten]
2021-01-25T05:58:37.309+0000 I C...
Hi SuccessfulKoala55 , yes for now I will like to start moving what inside the /opt/trains/data/fileserver..
because as I understand the logs and graphs are saved in elastic so I think it will not be easy to move them as well right?
Thanks for the reply,
I saw that it prefer to change the fileserver in trains.conf to s3://XXX
So, I changed this as I wrote before.
If I will mount the S3 bucket to the trains-server and link the mount to /opt/trains/data/fileserver does it will work?
SuccessfulKoala55 and AppetizingMouse58 Thanks you very much!!
I have a future question:
Does this fix should harm in future cleraml-server upgrade?
Or what the best practice to upgrade after doing it?
Thanks AgitatedDove14 ,
I need to check with my boss that it is OK to share more code, will let you know..
But I will give 0.16 a try when it will release.
🙏
AgitatedDove14 Thanks, I am trying it..
I reproduced the stuck with this code..
But for now only with my env , when I tried to create new env only with the packages that this code needed it wont stuck.
So maybe the problem is conflict between packages?
I am trying to reproduce it with little example
Hey... Thanks for checking with me.
I didn't have time yet but will check it and let you know..
Sure, love to do it when I have more time 🙂
hey, I test it, it looks it works, still it takes much time (mainly in the second run of the code, it part of my eval process)
does it ok that it looks for files in /opt/trains
? since we move all to /opt/clearml
no?File "/opt/trains/apiserver/mongo/initialize/migration.py"
Thanks for the quick replay.
This will set more time before the timeout right?
Maybe there is a way to do something like:task.freeze_monitor() download() task.defrost_monitor()
I am sure you add this timeout for a reason.
Probably since increasing the timeout can affect other functionality. .
Am I wrong?
OK thanks for the answer.. I will usetask.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
as you suggested for now..
If you will add something like I suggest can you notify me?
Does it still work if I will keep trains.conf like this, and mount the S3 also?
Hi SuccessfulKoala55 and AgitatedDove14 ,
Thanks for the quick replay.
I'm not sure I understand your use-case - do you always want to change the contents of the file in your code? Why not change it before connecting?
Changing the file before the connect will make sense only when I am running locally and the file exists. Remotely I must get the file with connect_configuration(path, name=name)
before I am reading it.
"local_path" is ignored, path is a temp file, and the c...
I tried you solution but since my path is to a YAML file,
and task.set_configuration_object(name=name, config_taxt=my_params)
upload this not in the same format task.connect_configuration(path, name=name)
it not working for me 😞
(even when I am using config_type='yaml'
)
I tried without yaml.dump(my_params_dict)
will try with it..
so the file was not the same as the connect_configuration uploaded
Thanks
SuccessfulKoala55 Thanks 🙏 I will give it a try tomorrow 🙂
Thanks I am basing my docker on https://github.com/facebookresearch/detectron2/blob/master/docker/Dockerfile
Hi AppetizingMouse58 , I had around 200GB when I started the migration now I have 169GB/
And yes, It looks it is growing was 9.4GB and now 9.5G
how long? 😅
I am now stuck inCopying index events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b
for more then 40 min 😥
yes it looks like this.. I just wanted to understand if it is should be so slow.. or I did something wrong