
Reputation
Badges 1
75 × Eureka!Btw it seems the docker runs in network=host
I start clearml-session on my mac this way:clearml-session --queue gpu --docker registry.gitlab.com/periplo-innovation/project-precog/clearml_config
Sure, will send in a few min when it executes
I dont have a short version.
I am using community clearml. How do I find out my version?
I resolved the issues by making my own docker image and making all envs the same:
The env that runs clearml-agent The docker env for running tasks in The env that requests task execution (my client)
I don’t understand. The current cuda version is 11.7. Installed pytorch version is 1.12.1. Torch can access GPUs, all is fine.
Why does it try to install a different torch version?
` (base) boris@adamastor:~$ nvidia-smi
Fri Oct 7 14:16:24 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name ...
It's too much of a hack :)
I can telnet the port from my mac:(base) *[main][~/Documents/plant_age]$ telnet 192.168.1.55 10022 Trying 192.168.1.55... Connected to 192.168.1.55. Escape character is '^]'. SSH-2.0-OpenSSH_8.4p1 Debian-5+deb11u1 ^C
The agent is started from a non-root user if that matters
I also use TB.
I solved the issue by implementing my own ClearML logger
CostlyOstrich36 CLEARML-AGENT version 1.3.0
AgitatedDove14
made a new one:
https://pastebin.com/LxLFk7py
Ok, it makes sense. But it’s running in docker mode and it is trying to ssh into the host machine and failing
(base) boris@adamastor:~/clearml_config$ clearml-agent --version CLEARML-AGENT version 1.4.0
Upgraded, the issue persists
Was I right to put the credentials in clearml.conf
on the machine I am starting the agent on?
Clearml conf is like this...stuff.. agent { git_user: "btseytlin" git_pass: "gitlab accesstoken" }
Yes, I am able to clone locally on the same server the agent is running on. However I do it using ssh auth
Yes, the git user is correct. It does not display the password of course. I tested and the config is definitely coming from clearml.conf
Still, the error persists
I am doing clearml-agent --docker … --foreground --gpus 1
"realmodelonly.pkl"
should be the full path, or just the file name?
When trying it I realized that my local clearml.conf
had the old hostnames still ( adamastor.gaiavf.local
). Now your script returns the proper value of http://adamastor-office.periploinnovation.com:8081 . I will see if it works now!
All ports are open (both agent machine and client machine are working within same VPN)
I mean if I enter my host machine ssh password it works. But we will disable password auth in future, so it’s not an option