Try to set CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true
in the terminal start clearml-agent
See None
from what I understand, the docker mode were designed for apt
based image and also running as root
inside the container.
We have container that are not apt
based and running not as root
We also do some "start up" that fetch credentials from Key Vault prior running the agent
had you made sure that the agent inside GCP VM have access to your repository ? Can you ssh into that VM and try to do a git clone ?
python library don't always use OS certificates ... typically, we have to set REQUESTS_CA_BUNDLE=/path/to/custom_ca_bundle_crt
because requests
ignore OS certificates
is this mongodb type of filtering?
Can you paste here what inside "Installed package" to double check ?
if you are on github.com , you can use Fine tune PAT token to limit access to minimum. Although the token will be tight to an account, it's quite easy to change to another one from another account.
(I never played with pipeline feature so I am not really sure that it works as I imagined ...)
Should I get all the workers None
Then go through them and count how many is in my queue of interest ?
or which worker is in a queue ...
or simply create a new venv in your local PC, then install your package with pip install from repo url and see if your file is deployed properly in that venv
got it
Thanks @<1523701070390366208:profile|CostlyOstrich36>
I saw that page ... but nothing about number of worker of a queue .... or did I miss it ?
you should be able to test your credential first using something like rclone or azure-cli
Should i open a feature request?
I don;t think ClearML is designed to handle secrets other than git and storage ...
normally, you should have a agent running behind a "services" queue, as part of your docker-compose. You just need to make sure that you populate the appropriate configuration on the Server (aka set the right environment variable for the docker services)
That agent will run as long as your self-hosted server is running
--gpus 0,1
: I believe this basically say that your code launched by the agent has access to both GPUs and that is it. Now it is up to your code to choose which GPU to use and what not and how ...
how did you deploy your clearml server ?
like for dataset_dir
I would expect a single path, not an array of 2 paths duplicated
my code looks like this :
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--config-file', type=str, default='train_config.yaml',
help='train config file')
parser.add_argument('-t', '--train-times', type=int, default=1,
help='train the same model several times')
parser.add_argument('--dataset_dir', help='path to folder containing the preped dataset.', required=True)
parser.add_argument('--backup', action='s...
Solved @<1533620191232004096:profile|NuttyLobster9> . In my case:
I need to from clearml import Task
very early in the code (like first line), before importing argparse
And not calling task.connect(parser)
so i guess it need to be set inside the container
I also have the same issue. Default argument are fine but all supplied argument in command line become duplicated !
Found the issue: my bad practice for import 😛
You need to import clearml before doing argument parser. Bad way:
import argparse
def handleArgs():
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--config-file', type=str, default='train_config.yaml',
help='train config file')
parser.add_argument('--device', type=int, default=0,
help='cuda device index to run the training')
args = parser....
please share your .service
content too as there are a lot of way to "spawn" in systemd