Reputation
Badges 1
22 × Eureka!Managed to fix it. I cloned again the task and it pulled the correct docker image. However it still tried to use GPU. To fix this a had to spin the agent using --cpu-only flag (--docker --cpu-only)
This morning I kiled the agent and spinned it up again with --foreground and it worked flawlesly. Then I tried without --foreground and it still worked. I didn't change anything else, weird. When I set up another worker in the future I will keep an eye on it
BTW: is it better to post the long error message on a reply to avoid polluting the channel?
Maybe it's a problem regarding the arm64 architecture
It still didn't work. I created the ssh keys on the agent with ssh-keygen -C "<user>"
, added the public key to bitbucket to allow it, and didn't specified the SSH_AUTH_SOCK now. Am I missing something? Do I need to setup an ssh-agent on host before spinning up the agent?
Update: I just launched another instance with ubuntu 20.04 but with amd64 (x86_64), using the same docker compose and it worked just great. Were the docker images built on amd64? Do I need a specific attention when running amd64 images on arm64:
When running the task locally:
` # Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)]
clearml == 1.8.3 And when running on a remote clearml-agent:
attrs==22.1.0
certifi==2022.12.7
charset-normalizer==2.1.1
clearml==1.8.3
Cython==0.29.32
distlib==0.3.6
filelock==3.8.2
furl==2.1.3
idna==3.4
jsonschema==4.17.3
numpy==1.24.0
orderedmultidict==1.0.1
pathlib2==2.3.7.post1
Pillow==9.3.0
platformdirs==2.6.0
psutil==5.9.4
PyJWT==2.4.0
pyparsing==3.0.9
pyrsisten...
I found that the original main code can detect the dependencies when running in a .py file. However, when running in a jupyter notebook .ipynb, it cannot.
I tried using Task.force_requirements_env_freeze(requirements_file=requirements.txt)
, before calling Task.init but it didn't work, the requirements didnt show as installed packages. So I added Task.add_requirements("requirements.txt")
and it worked fine. I thinks this is a proper workaround.
before asking the agent to run it, I also pushed the code and requirements to git so the agent should see it
No, I use a SSH key to give the agent access. I am sure the key is working and he is cloning properly
One work around is adding:import pandas as pd import numpy as np
on the main file. This way the depency is properly detected. But idk, it seems like this shouldn't be a problem. Did you guys managed to reproduce the error?
Yess, all the files are on the same git repo and same branch, including the requirements.txt.
I added a requirements.txt file on the same lvl of main.ipynb. But it still didn't detect the dependency and resulted in a importerror for pandas
The workaround of importing pandas and numpy is very limited, because once your code.py imports from another files (an utils.py, for example), you can get lost pretty quickly with the libs.
That's weird, even when passing an id, it still creates a new task id.
The error i'm getting is:cloning:
Using SSH credentials - replacing https url '
' with ssh url '
' 2022-12-16 14:33:39 fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
However I can successfuly clone the repo on the host vm, so the ssh key should be giving me the permisison.
I think it's nice to have a plot with the network layers. Maybe I can plot it using other lib.