Reputation
Badges 1
92 × Eureka!I can only guess with little information here. You better try to debug with print statement. Is this happening in submodule uncommited changes ?
if you have 2 agent serving the same queue and then send 2 task to that queue, each agent should take one task
But if you queue sequentially one task then wait until that task to finish and queue the next: then it will be random which agent will take the task. Can be the same on from the previous task
Are you saying that you have 1 agent running task, 1 agent sitting idle while there is a task waiting in the queue and no one is processing it ??
like for dataset_dir
I would expect a single path, not an array of 2 paths duplicated
depend on how the agent is launched ...
Can you paste here what inside "Installed package" to double check ?
I tried mounting azure storage account on that path and it worked: all files end up in the cloud storage
something like this: None ?
or simply create a new venv in your local PC, then install your package with pip install from repo url and see if your file is deployed properly in that venv
In the web UI, in the queue/worker tab, you should see a service queue and a worker available in that queue. Otherwise the service agent is not running. Refer to John c above
Found the issue: my bad practice for import 😛
You need to import clearml before doing argument parser. Bad way:
import argparse
def handleArgs():
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--config-file', type=str, default='train_config.yaml',
help='train config file')
parser.add_argument('--device', type=int, default=0,
help='cuda device index to run the training')
args = parser....
I also have the same issue. Default argument are fine but all supplied argument in command line become duplicated !
I really like how you make all this decoupled !! 🎉
If you care about the local destination then you may want to use this None
@<1523701087100473344:profile|SuccessfulKoala55> I managed to make this working by:
concat the existing OS ca bundle and zscaler certificate. And set REQUESTS_CA_BUNDLE
to that bundle file
--gpus 0,1
: I believe this basically say that your code launched by the agent has access to both GPUs and that is it. Now it is up to your code to choose which GPU to use and what not and how ...
what you mean by different script ?
I understand for cleaml-agent
What I mean is that I have 2 self deployed server. I want to switch between the 2 config when running the code locally, not inside the agent
what about having 2 agents, one on each GPU, on the same machine, serving the same queue ? So that when you enqueue, which ever agent (thus GPU) available will take the new task
Are you sure all the files needed are pushed to your git repo ?
Go to a another folder and git clone that exact branch/commit and check the files are there ?
i need to do a git clone
You need to do it to test if it works. Clearml-agent will run it itself when it take in a task
in my case, I set eveything inside the container, including the agent and not using docker mode altogether.
When my container start, it start the agent inside it in "normal" mode
ok, so if git commit or uncommit changes differ from previous run, then the cache is "invalidated" and the step will be run again ?
what about the log aroundwhen it try to actually clone your repo ?
For #2: it's a pull rather than a push system: you need to have a script that do pulling at regular interval and need to keep track what new and what not?
Artifact can be anything, that you can use clearml SDK to upload to storage. Which storage is used is defined by your clearml.conf (with its credentials) ClearML web and api server do not store those files
Model is a special artifact: None
Example you have the lineage feature where if you train model B using model A as starting point (aka pre-trained) , and model C from model B, ... The lineage will track modelC was built on...
Do I need not make changes into clearml.conf so that it doesn't ask for my credentials or is there another way around
You have 2 options:
- set credential inside cleaml.conf : i am not familiar with this and never test it.
- or setup password less ssh with public key None
from what I understand, the docker mode were designed for apt
based image and also running as root
inside the container.
We have container that are not apt
based and running not as root
We also do some "start up" that fetch credentials from Key Vault prior running the agent
@<1523701087100473344:profile|SuccessfulKoala55> It's working !! Thank you very much !!! Clearml is awesome !!!!