OK - the issue was the firewall rules that we had.
Nice!
But now there is an issue with the
Setting up connection to remote session
OutrageousSheep60 this is just a warning, basically saying we are using the default signed SSH server key (has nothing to do with the random password, just the identifying key being used for the remote ssh session)
Bottom line, I think you have everything working 🙂
Hi AgitatedDove14
OK - the issue was the firewall rules that we had.
Now both of the jupyter lab
and vscode
servers are up.
But now there is an issue with the Setting up connection to remote session
After the
Environment setup completed successfully
Starting Task Execution:
ClearML results page:
There is a WARNING
clearml - WARNING - Could not retrieve remote configuration named 'SSH'
Where do we define the remote configuration
?
When the Interactive session config is displayed -I can see the User and Password - which I can separately ssh into the agent
machine - (so I know the User and Password are correct)
e.g. The config session uses USER1Interactive session config: { "base_task_id": null, "docker": "nvcr.io/nvidia/pytorch:20.11-py3", "git_credentials": false, "jupyter_lab": true, "keepalive": false, "packages": [ "clearml", "tensorflow>=2.2", "keras" ], "password": "PASS", "queue": "QUEUE", "username": "USER1", "verbose": true, "vscode_server": true}
I'm checking the possibility of our firewall between the
clearml-agent
machine and the local computer running the
session
Maybe... the thing is, how come the session creates a Task, push it into the queue, but the Task itself is empty.
Hence my request for the clearml-session console log, like actual copy paste of what you have in the terminal, not the Task log from the UI
I'm checking the possibility of our firewall between the clearml-agent
machine and the local computer running the session
This really makes little sense to me...
Can you send the full clearml-session --verbose console output ?
Something is not working as it should obviously, console output will be a good starting point
AgitatedDove14 -
I also tried to https://github.com/allegroai/clearml-session
running the session
within docker but got the same error
clearml-session --docker
--git-credentials
(there is a typo in git - --git-credent ila s -> --git-credent ials)
and still got the same error
clearml_agent: ERROR: Can not run task without repository or literal script in
script.diff
Are you running the "cleamrl-session" from your machine? (i.e. not from inside a docker) ?
correct - running it locally - not inside docker . Should I try to run within a docker?
Can you send the full clearml-session console output ?
see above
OutrageousSheep60
I found the task in the UI -
and in the
UNCOMMITTED CHANGES
execution section there is
No changes logged
This is the issue.
and then run the
session
via docker
clearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 \ --packages "clearml" "tensorflow>=2.2" "keras" \ --queue MY_QUEUE \ --verbose
Are you running the "cleamrl-session" from your machine? (i.e. not from inside a docker) ?
Can you send the full clearml-session console output ?
Hi SuccessfulKoala55
Any suggestion how to progress ?
Is there any settings that we need to take into account when working with session
?
in the https://clear.ml/docs/latest/docs/apps/clearml_session#accessing-a-git-repository it mentions accessing Git Repository -
Can you run clearml sessions
without accessing Git? Assuming we are using ssh
- what is the correct configuration?
I found the task in the UI -
and in the UNCOMMITTED CHANGES
execution section there is
No changes logged
Any other suggestions?
clearml_agent: ERROR: Can not run task without repository or literalscript in
script.diff
This is odd ...
OutrageousSheep60 when you launch clearml-session
it tells you the session ID (which is also a Task ID), can you look for it in the UI and check there is something in the repo/uncommitted-changes section ?
Hi SuccessfulKoala55
I've run the daemon via dockerCLEARML_WORKER_ID=XXXX clearml-agent daemon --queue MY_QUEUE --docker --detached
and then run the session
via dockerclearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 \ --packages "clearml" "tensorflow>=2.2" "keras" \ --queue MY_QUEUE \ --verbose
However I'm still getting the same error
clearml_agent: ERROR: Can not run task without repository or literal script in
script.diff
BTW - is the CLEARML_HOST_IP
relevant for the clearml-agent
?
i can see that we can create a worker with this environment variable . e.g.CLEARML_WORKER_NAME=MY-WORKDER CLEARML_WORKER_ID=MY-WORKER:0 CLEARML_HOST_IP=X.X.X.X clearml-agent daemon --detached
my mistake doesn't use it to create a dedicated IP
Hi OutrageousSheep60 , I think you'll need to run the agent in docker mode