Reputation
Badges 1
75 × Eureka!I guess this pip package installation happens as part of docker build
Was I right to put the credentials in clearml.conf
on the machine I am starting the agent on?
Clearml conf is like this...stuff.. agent { git_user: "btseytlin" git_pass: "gitlab accesstoken" }
Sure, will so tomorrow
This issue was resolved by setting the correct clearml.conf
(replacing localhost with a public hostname for the server) 🙂
AgitatedDove14 This example does not specify how to start a clearml-agent with docker such that it actually executes the task
Freezing means that after the pip packages installation, pictured on screenshot, nothing happens. This screen hangs forever. No other output anywhere, including the web UI
For a hacky way you can do docker ps
and see the docker run command. I believe it contains the task id, so you can grep by task id
Definitely not, the machine has 5 TB and is a recent clear install
Looking through history I found this link: None
Tldr: ClearML doesn’t support lightning
, but supports pytorch_lightning
. Downgrading from the new interface to the old one fixed my issue
(But in venv mode is also hangs the same way)
So the only process is something called /usr/local/bin/python3.10 -u -m clearml_agent execute
.
So I guess pip install finished working
But the task is evidently not being executed.
I tried it.
This time agent was run with docker image python ( https://hub.docker.com/_/python )
Gets stuck onInstalling collected packages: six, python-dateutil, pathlib2, psutil, attrs, pyrsistent, jsonschema, idna, chardet, certifi, urllib3, requests, PyYAML, pyparsing, pyjwt, pyhocon, orderedmultidict, furl, future, platformdirs, filelock, distlib, virtualenv, clearml-agent
ps aux inside the container reads
` (base) boris@adamastor:~$ docker exec -it angry_edison bash
root@041c0736c...
The agent is started with this command:clearml-agent --debug daemon --queue gpu --gpus 0 --foreground --docker <gitlab org registry>/project-precog/clearml_config
Ok, it makes sense. But it’s running in docker mode and it is trying to ssh into the host machine and failing
The failure is that it does not even run
Is there some minimal example of a docker env agent I can run, just to see that it works?
(base) boris@adamastor:~/clearml_config$ clearml-agent --version CLEARML-AGENT version 1.4.0
On the agent side it’s trying to install different pytorch versions (even though the env already has it all configured), then fails with torch_<something>.whl is not a valid wheel for this system
AgitatedDove14 With --debug
I see that after installing packages there is an endless stream of this:
` Retrying (Retry(total=239, connect=239, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fac842e8be0>: Failed to establish a new connection: [Errno 111] Connection refused',)': /auth.login
Retrying (Retry(total=238, connect=238, read=240, redirect=240, status=240)) after connection broken by 'NewConnec...
So I guess the container cant access the clearml api because of localhost?
Sure, will send in a few min when it executes
Agent is running in docker mode. The host OS is ubuntu
Well I don’t want that! My local machine is a Mac with no GPU. But I want to execute my code on a server with GPUs. I don’t want my local environment, I want the one configured for the agent!
CostlyOstrich36 CLEARML-AGENT version 1.3.0
But what should I do? It does not work, it says incorrect password as you can see
When trying it I realized that my local clearml.conf
had the old hostnames still ( adamastor.gaiavf.local
). Now your script returns the proper value of http://adamastor-office.periploinnovation.com:8081 . I will see if it works now!
I start clearml-session on my mac this way:clearml-session --queue gpu --docker registry.gitlab.com/periplo-innovation/project-precog/clearml_config