Reputation
Badges 1
92 × Eureka!Thanks I am basing my docker on https://github.com/facebookresearch/detectron2/blob/master/docker/Dockerfile
I tried without yaml.dump(my_params_dict) will try with it..
so the file was not the same as the connect_configuration uploaded
Thanks
Thanks, I will make sure that all the python packages install as root..
And will let you know if it works
Hi SuccessfulKoala55 ,
Dose running_remotely() will return True even if the task was enqueued from UI and not by execute_remotely ?
Does it still work if I will keep trains.conf like this, and mount the S3 also?
OK thanks for the answer.. I will usetask.set_resource_monitor_iteration_timeout(seconds_from_start=1800)as you suggested for now..
If you will add something like I suggest can you notify me?
So for now I am leaving this issue...
Thanks a lot 🙏 🙌
Thanks AgitatedDove14 ,
I need to check with my boss that it is OK to share more code, will let you know..
But I will give 0.16 a try when it will release.
🙏
It is now stacking after:
` 2021-03-09 14:54:07
task 609a976a889748d6a6e4baf360ef93b4 pulled from 8e47f5b0694e426e814f0855186f560e by worker ov-01:gpu1
2021-03-09 14:54:08
running Task 609a976a889748d6a6e4baf360ef93b4 inside default docker image: MyDockerImage:v0
2021-03-09 14:54:08
Executing: ['docker', 'run', '-t', '--gpus', '"device=1"', '-e', 'CLEARML_WORKER_ID=ov-01:gpu1', '-e', 'CLEARML_DOCKER_IMAGE=MyDockerImage:v0', '-v', '/tmp/.clearml_agent.jvxowhq4.cfg:/root/clearml.conf', '-v', '/...
Hey... Thanks for checking with me.
I didn't have time yet but will check it and let you know..
The hang is still happening in trains==0.15.2rc0
I am trying to reproduce it with little example
AgitatedDove14 Thanks, I am trying it..
SuccessfulKoala55 Thanks 🙏 I will give it a try tomorrow 🙂
does it ok that it looks for files in /opt/trains ? since we move all to /opt/clearml no?File "/opt/trains/apiserver/mongo/initialize/migration.py"
Hi SuccessfulKoala55 ,
I down the server:
` [ec2-user@ip-172-31-26-41 ~]$ sudo docker-compose -f /opt/clearml/docker-compose.yml down
WARNING: The CLEARML_HOST_IP variable is not set. Defaulting to a blank string.
WARNING: The CLEARML_AGENT_GIT_USER variable is not set. Defaulting to a blank string.
WARNING: The CLEARML_AGENT_GIT_PASS variable is not set. Defaulting to a blank string.
Stopping clearml-webserver ... done
Stopping clearml-agent-services ... done
Stopping clearml-apiserver...
` [2021-01-24 17:02:25,660] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 2ms
[2021-01-24 17:02:25,674] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 8ms
[2021-01-24 17:02:26,696] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 36ms
[2021-01-24 17:02:26,742] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 78ms
[2021-01-24 17:02:27,169] [8] [INFO] [trains.service_repo] Returned 200 for projects.get_al...
SuccessfulKoala55 and AppetizingMouse58 Thanks you very much!!
I have a future question:
Does this fix should harm in future cleraml-server upgrade?
Or what the best practice to upgrade after doing it?
I update to the new version 0.16.1 few weeks away and it works using the elastic_upgrade.py
the index creation:[ec2-user@ip-172-31-26-41 ~]$ sudo docker exec -it clearml-mongo /bin/bash root@3fc365193ed0:/# mongo MongoDB shell version v3.6.5 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.6.5 Welcome to the MongoDB shell. For interactive help, type "help". For more comprehensive documentation, see Questions? Try the support group `
Server has startup warnings:
2021-01-25T05:58:37.309+0000 I CONTROL [initandlisten]
2021-01-25T05:58:37.309+0000 I C...
Hi SuccessfulKoala55 , yes for now I will like to start moving what inside the /opt/trains/data/fileserver..
because as I understand the logs and graphs are saved in elastic so I think it will not be easy to move them as well right?
If I will mount the S3 bucket to the trains-server and link the mount to /opt/trains/data/fileserver does it will work?
