what is the command you use to run clearml-agent ?
@<1523701205467926528:profile|AgitatedDove14> I was able to resolve that, but now I am having issues with fiftyone, it's showing me the following error
import fiftyone as fo
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/fiftyone/init.py", line 25, in <module>
from fiftyone.public import *
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/fiftyone/public.py", line 15, in <module>
_foo.establish_db_conn(config)
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/fiftyone/core/odm/database.py", line 200, in establish_db_conn
port = _db_service.port
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/fiftyone/core/service.py", line 276, in port
return self._wait_for_child_port()
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/fiftyone/core/service.py", line 170, in _wait_for_child_port
return find_port()
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/retrying.py", line 56, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/retrying.py", line 266, in call
raise attempt.get()
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/retrying.py", line 301, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise
raise value
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/retrying.py", line 251, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/fiftyone/core/service.py", line 168, in find_port
raise ServiceListenTimeout(etau.get_class_name(self), port)
fiftyone.core.service.ServiceListenTimeout: fiftyone.core.service.DatabaseService failed to bind to port
this looks like the agent running inside your docker did not have any username/password to do git clone. so the default behavior is to wait for keyboard input: which look like hanging ....
Ok I'll try that out, enable_git_ask_pass: true is not working
Hmm I see, add this for example
extra_docker_shell_script: ["rm ~/.bashrc", "echo removed bashrc"]
while we spin up the autoscaler instance
Just a follow up on this issue, @<1523701087100473344:profile|SuccessfulKoala55> @<1523701205467926528:profile|AgitatedDove14> I would very much appreciate it if you could help me with this.
on the host machine or inside the containers that are spinning on the host machine ?
inside the containers that are spinning on the host machine
If you can let me know @<1576381444509405184:profile|ManiacalLizard2> @<1523701087100473344:profile|SuccessfulKoala55> how to resolve this, that would be very much helpful
how di you provide credentials to clearml and git ?
@<1610083503607648256:profile|DiminutiveToad80> try to turn on:
None
enable_git_ask_pass: true
Try to add '--network host' to the docker args on the task you are launching
While creating the autoscaler instance I did provide my git credentials, i.e my username and Personal Access Token.
How exactly did you do that ?
Hi @<1610083503607648256:profile|DiminutiveToad80> , can you perhaps include a more comprehensive log?
Let me know if this is enough information or not
I provided the credentials while setting up the autoscaler instance, where can I look for the clearml.conf. When I ssh into the instance, spin up by the autoscaler, I am not able to see the clearml.conf
try:
None
docker_install_opencv_libs: true
Ok I was able to resolve the above issue, but now I am getting the following error while executing a task
import cv2
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/cv2/init.py", line 181, in <module>
bootstrap()
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/cv2/init.py", line 153, in bootstrap
native_module = importlib.import_module("cv2")
File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Note: switching to 'commit_id'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at commit_id
type: git
url: git_repo
branch: HEAD
commit: commit_id
root: root_dir
Ignoring pip: markers 'python_version >= "3.10"' don't match your environment
Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 23.2.1
Uninstalling pip-23.2.1:
Successfully uninstalled pip-23.2.1
2023-10-12 11:49:23
Successfully installed pip-20.1.1
Collecting git+it_repo_name
Cloning git_repo
Running command git clone -q git_repo_name
Username for ' None ':
2023-10-12 12:19:36
User aborted: stopping task (1)
2023-10-12 12:19:36
Process aborted by user
And one more thing is there a way to make changes to the .bashrc which is present inside the docker container
I don't have it so I don't know how things are setup and how to pass on credentials in this case
I am not familiar with autoscaler ... are you using the paid version of Clearml ?
Because I think I need to have the following two lines in the .bashrc and the Google_Application_credentials
git config --global user.email 'email'
git config --global user.name "user_name"
what does your clearml.conf look liks ?
Then try to add the missing apt packages
extra_docker_shell_script: ["apt-get install -y ???", ]