Reputation
Badges 1
92 × Eureka!You can either set your user permission to allow group write by default ?
Or maybe create a dedicated user with group write permission and run the agent with that user ?
We need to focus first on Why is it taking minutes to reach Using env.
In our case, we have a container that have all packages installed straight in the system, no venv in the container. Thus we don't use CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
But then when a task is pulled, I can see all the steps like git clone, a bunch of Requirement already satisfied
.... There may be some odd package that need to be installed because one of our DS is experimenting ... But all that we can see what is...
Ok I think I found the issue. I had to point the file server to azure storage:
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server:
web_server:
files_server: "
"
credentials {"access_key": "REDACTED", "secret_key": "REDACTED"}
}
Clear. Thanks @<1523701070390366208:profile|CostlyOstrich36> !
@<1523701070390366208:profile|CostlyOstrich36> Is there a way to tell clearml to not try to detect the Installed package ?
But then how did the agent know where is the venv that it needs to use?
Nevermind: None
By default, the File Server is not secured even if Web Login Authentication has been configured. Using an object storage solution that has built-in security is recommended.
My bad
--gpus 0,1
: I believe this basically say that your code launched by the agent has access to both GPUs and that is it. Now it is up to your code to choose which GPU to use and what not and how ...
I mean, depend on what do you want to report ... if you want to stick to table, I suggest earlier to gather your stats in table format ...
Otherwise, matplotlib seems to be the most user friendly way
with
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
import clearml
task = clearml.Task.current_task()
task.get_logger().report_table(title='table example', series='pandas DataFrame', iteration=0, table_plot=df)
# logger.report_table(title='table example',series='pandas DataFrame',iteration=0,tabl...
this looks like the agent running inside your docker did not have any username/password to do git clone. so the default behavior is to wait for keyboard input: which look like hanging ....
you can upload the df as artifact.
Or the statistics as a DataFrame and upload as artifact ?
Can you share the agent log, in the console tab, before the error?
I use ssh public key to access to our repo ... Never tried to provide credential to clearml itself (via clearml.conf
) so I cannot help much here ...
(I never played with pipeline feature so I am not really sure that it works as I imagined ...)
you should be able to use as many agent as you want.
On the same or different queue
one the same or different machine !
not sure how that work with Docker and machine that is not set up with ssh public key ... We will go to that path sometime in the future so I am quite interested too, on how people do it without ssh public key
i need to do a git clone
You need to do it to test if it works. Clearml-agent will run it itself when it take in a task
with ssh public key, if from a terminal, I can do git clone, then so do the clearml agent, as it run on behalf of an local user. That apply to both local and VM
Do I need not make changes into clearml.conf so that it doesn't ask for my credentials or is there another way around
You have 2 options:
- set credential inside cleaml.conf : i am not familiar with this and never test it.
- or setup password less ssh with public key None
because when I was running both agents on my local machine everything was working perfectly fine
This is probably you (or someone) had set up ssh public key with your git repo sometime in the past
had you made sure that the agent inside GCP VM have access to your repository ? Can you ssh into that VM and try to do a git clone ?
there is a whole discussion about it here: None
what is the command you use to run clearml-agent ?
you will need to provide more context than that if you don't want the answer: Have you try to turn it off and back on again ?
I think a proper screenshot of the full log with some information redacted is the way to go. Otherwise we are just guessing in the dark
So we have 3 python package, store in github.com
On the dev machine, the datascientist (DS) will add the local ssh key to his github account as authorized ssh keys, account level.
With the DS can run git clone git@github.com:org/repo1
then install that python package via pip install -e .
Do that for all 3 python packages, each in its own repo1
, repo2
and repo3
. All 3 can be clone using the same key that the DS added to his account.
The DS run a tra...