@<1523701087100473344:profile|SuccessfulKoala55> I found out I had some issues with the clearml_agent docker setup, lacking some environment variables. What should be the value for CLEARML_HOST_IP?
@<1559711593736966144:profile|SoggyCow20> I see this in the log Process terminated by user
- how exactly did you run this session task?
I did not need to launch it however when I was using app-clearml, why is it needed when running things locally?
clearml-session will automatically enqueue a task to the queue you specify
@<1523701087100473344:profile|SuccessfulKoala55> is clearml-session to initiate a queue?
One more update, I tried running the execute_remotely from app.clear.ml with a runner on local machine and with docker. it worked fine
@<1523701087100473344:profile|SuccessfulKoala55> Update - I tried to also run ClearML on my other personal laptop, I still do face the same issue where the docker agent gets stuck after some pip installations, even after being left to run for 10 minutes at that state, kindly find the attached log,
this is the command
clearml-agent daemon --cpu-only --docker pytorch/pytorch --queue default
the command to set up the agent is
clearml-agent daemon --docker pytorch_test --queue default --foreground --cpu-only
Hi @<1559711593736966144:profile|SoggyCow20> , can you attach the full log?
Hi @<1523701087100473344:profile|SuccessfulKoala55> !
many thanks for your reply!
kindly find the attached log
@<1523701087100473344:profile|SuccessfulKoala55> i stopped the runner execution by Ctrl+C in the terminal
@<1559711593736966144:profile|SoggyCow20> clearml session must be run using the clearml-session
command line took, you canno't run it manually or enqueue it yourself
The docker image getting stuck does not have to do with the pip warning, I managed to surpress that through an environment variable
Another Message it gets stuck at sometimes is
cp: -r not specified; omitting directory '/tmp/clearml.conf'
Using built-in ClearML default key/secret
and this is the log for it
To provide you with more info, I am setting up clearml on a linux server, and the agent is also set up on the same server.
I even tried this on the local machine MACOS with an agent on the same machine, and had the same issue.
I guess there must have been some confusion, I am not trying to run a clearml-session, or a session, I am trying to execute a task remotely, such as Clone of Scalar reporting example. When I try to clone it and execute it to a queue, the docker image is getting stuck after the pip installations. Which works if i connect to app.clear.ml, but does not work when i connect to localhost docker deployment
@<1523701087100473344:profile|SuccessfulKoala55> apologies for my late replies. I tried to run it through the frontend WebAPP, and I also tried to use the clearml/examples/advanced/execute_remotely.py
@<1559711593736966144:profile|SoggyCow20> how did you run the session task?