Reputation
Badges 1
92 × Eureka!OK thanks for the answer.. I will usetask.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
as you suggested for now..
If you will add something like I suggest can you notify me?
I am sure you add this timeout for a reason.
Probably since increasing the timeout can affect other functionality. .
Am I wrong?
For now we are using AWS batch for running those experiments.
because like this we don`t have to hold machines who waits for the jobs
OHH nice, I thought that it just some kind of job queue on up and running machines
Thanks!! you are the best..
I will give it a try when the runs will finish
Thanks I will upgrade my instance type and the add more workers. where I need to configure it?
AgitatedDove14 Maybe I need to change something here: apiserver.conf
for increasing workers number?
I didn't try trains-agent yet, does it support using AWS batch?
I did it just because FAIR did it in detectron2 Dockerfile
Not a very good one, they just installed everything under the user and used --user for the pip.
It really does not matter inside a docker, the only reason one might want to do that is if you are mounting other drives and you want to make sure they are not accessed with "root" user, but with 1000 user id.
This sounds a good reason haha 😄
Let me check if we can hack something...
Thanks 🙏
the index creation:[ec2-user@ip-172-31-26-41 ~]$ sudo docker exec -it clearml-mongo /bin/bash root@3fc365193ed0:/# mongo MongoDB shell version v3.6.5 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.6.5 Welcome to the MongoDB shell. For interactive help, type "help". For more comprehensive documentation, see
Questions? Try the support group
`
Server has startup warnings:
2021-01-25T05:58:37.309+0000 I CONTROL [initandlisten]
2021-01-25T05:58:37.309+0000 I C...
SuccessfulKoala55 and AppetizingMouse58 Thanks you very much!!
I have a future question:
Does this fix should harm in future cleraml-server upgrade?
Or what the best practice to upgrade after doing it?
I update to the new version 0.16.1 few weeks away and it works using the elastic_upgrade.py
` [2021-01-24 17:02:25,660] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 2ms
[2021-01-24 17:02:25,674] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 8ms
[2021-01-24 17:02:26,696] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 36ms
[2021-01-24 17:02:26,742] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 78ms
[2021-01-24 17:02:27,169] [8] [INFO] [trains.service_repo] Returned 200 for projects.get_al...
I just need it to ran the docker and run the command inside it no?
does it ok that it looks for files in /opt/trains
? since we move all to /opt/clearml
no?File "/opt/trains/apiserver/mongo/initialize/migration.py"
Hi SuccessfulKoala55 ,
I down the server:
` [ec2-user@ip-172-31-26-41 ~]$ sudo docker-compose -f /opt/clearml/docker-compose.yml down
WARNING: The CLEARML_HOST_IP variable is not set. Defaulting to a blank string.
WARNING: The CLEARML_AGENT_GIT_USER variable is not set. Defaulting to a blank string.
WARNING: The CLEARML_AGENT_GIT_PASS variable is not set. Defaulting to a blank string.
Stopping clearml-webserver ... done
Stopping clearml-agent-services ... done
Stopping clearml-apiserver...
Ok looks It is starting the training...
Thanks 💯
but I am think they done it for a reason no?
Hi DeterminedCrab71 .
Thanks 🙂
Yes this is what we are doing 👍
Hi CumbersomeCormorant74 ,
This is a server we installed.
The server version is: 0.17
We checked with Chrome, and FireFox
Thanks, ophir