Reputation
Badges 1
92 × Eureka!I have an other question.
Now that I using the root user it looks better,
But my docker image has all my code and all the packages it needed I don't understand why the agent need to install all of those again?
So for now I am leaving this issue...
Thanks a lot 🙏 🙌
Ok looks It is starting the training...
Thanks 💯
Hi CumbersomeCormorant74 ,
This is a server we installed.
The server version is: 0.17
We checked with Chrome, and FireFox
Thanks, ophir
I tried without yaml.dump(my_params_dict)
will try with it..
so the file was not the same as the connect_configuration uploaded
Thanks
Hi SuccessfulKoala55 ,
Dose running_remotely()
will return True even if the task was enqueued from UI and not by execute_remotely
?
From the UI it will since it getting the temp file from there.
I mean from the code (let say remotely)
someone in my company started a training 😥 , will do it after it will finish.. and will update
Thanks you are the best 🙏
ARG USER_ID=1000 RUN useradd -m --no-log-init --system --uid ${USER_ID} appuser -g sudo RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers USER appuser WORKDIR /home/appuser
Thanks for the quick replay.
This will set more time before the timeout right?
Maybe there is a way to do something like:task.freeze_monitor() download() task.defrost_monitor()
I am sure you add this timeout for a reason.
Probably since increasing the timeout can affect other functionality. .
Am I wrong?
` [2021-01-24 17:02:25,660] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 2ms
[2021-01-24 17:02:25,674] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 8ms
[2021-01-24 17:02:26,696] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 36ms
[2021-01-24 17:02:26,742] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 78ms
[2021-01-24 17:02:27,169] [8] [INFO] [trains.service_repo] Returned 200 for projects.get_al...
does it ok that it looks for files in /opt/trains
? since we move all to /opt/clearml
no?File "/opt/trains/apiserver/mongo/initialize/migration.py"
how long? 😅
I am now stuck inCopying index events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b
for more then 40 min 😥
but I am think they done it for a reason no?
Thanks for the reply,
I saw that it prefer to change the fileserver in trains.conf to s3://XXX
So, I changed this as I wrote before.
SuccessfulKoala55 Thanks 🙏 ..
Another related question:
My remote job fails because it cannot find the data.FileNotFoundError: [Errno 2] No such file or directory: './data/XXXXXXXX
I mounted the data to the same place relative to my project inside the docker with: extra_docker_arguments
I am using execute_remotely
for enqueue the job.
I know it works locally since the job reads from ./data/XXXX before execute_remotely()
and working.
but when the agent create ...
I have one computer with 4 GPUs and like to create a queue over the gpus..
For now the project runs without queue.
My configs holds the relative paths to the data (and it can take time to change all of them) so I prefer to work in relative paths if it possible..
I reproduced the stuck with this code..
But for now only with my env , when I tried to create new env only with the packages that this code needed it wont stuck.
So maybe the problem is conflict between packages?
OK thanks for the answer.. I will usetask.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
as you suggested for now..
If you will add something like I suggest can you notify me?
WOW.. Thanks 💯
the index creation:[ec2-user@ip-172-31-26-41 ~]$ sudo docker exec -it clearml-mongo /bin/bash root@3fc365193ed0:/# mongo MongoDB shell version v3.6.5 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.6.5 Welcome to the MongoDB shell. For interactive help, type "help". For more comprehensive documentation, see
Questions? Try the support group
`
Server has startup warnings:
2021-01-25T05:58:37.309+0000 I CONTROL [initandlisten]
2021-01-25T05:58:37.309+0000 I C...
AgitatedDove14 Thanks, I am trying it..