Is it because your training code download the pretrain model from pytorch or whatever, to local disk in /tmp/xxx then train from there ? so ClearML will just reference the local path.
I think you need to manually download the pre-train model, then wrap it with Clearml InputModel (eg here )
And then use that InputModel as pre-train ?
May be clearml staffs have better approach ? @<152370107039036...
That --docker_args seems to be for clearml-task as described here , while you are using clearml-agent which is a different thing
the agent inside the docker compose is just a handy one to serve a service queue where you can queue all your "clean up" tasks that are not deep learning related, using only a bit of CPU
so it's not suppose to say "illegal output destination ..." ?
Found the issue: my bad practice for import 😛
You need to import clearml before doing argument parser. Bad way:
import argparse
def handleArgs():
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--config-file', type=str, default='train_config.yaml',
help='train config file')
parser.add_argument('--device', type=int, default=0,
help='cuda device index to run the training')
args = parser....
what about having 2 agents, one on each GPU, on the same machine, serving the same queue ? So that when you enqueue, which ever agent (thus GPU) available will take the new task
But then how did the agent know where is the venv that it needs to use?
If you are using multi storage place, I don't see any other choice than putting multi credential in the conf file ... Free or Paid Clearml Server ...
the underlying code has this assumption when writing it
That means that you want to make things work not in a standard Python way ... In which case you need to do "non-standard" things to make it work.
You can do this for example in the beginning of your run.py
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
In this way, you not relying on a non-standard feature to be implemented by your tool like pycharm or `cle...
you should know where your latest model is located then just call task.upload_artifact on that file ?
the weird thing is that: the GPU 0 seems to be in used as reported by nvtop in the host. But it is 50% slower than when running directly instead of through the clearml-agent ...
What exactly are you trying to achieve ?
Let assume that you have Task.init() in run.py
And run.py is inside /foo/bar/
If you run :
cd /foo
python bar/run.py
Then the Task will have working folder /foo
If you run:
cd /foo/bar
python run.py
Then your task will have the working folder /foo/bar
I think ES use a greedy strategy where it allocate first then use it from there ...
because when I was running both agents on my local machine everything was working perfectly fine
This is probably you (or someone) had set up ssh public key with your git repo sometime in the past
How are you using the function update_output_model ?
Can you share the agent log, in the console tab, before the error?
kind of ....
Now I think about it, the best approach would be to:
- Clone a task
@<1523701087100473344:profile|SuccessfulKoala55> Is it even possible to have the server storing file to a given blob storage ?
Clear. Thanks @<1523701070390366208:profile|CostlyOstrich36> !
Artifact can be anything, that you can use clearml SDK to upload to storage. Which storage is used is defined by your clearml.conf (with its credentials) ClearML web and api server do not store those files
Model is a special artifact: None
Example you have the lineage feature where if you train model B using model A as starting point (aka pre-trained) , and model C from model B, ... The lineage will track modelC was built on...
(wrong tab sorry :P)
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL need to be a path
1.12.2 because some bug that make fastai lag 2x
1.8.1rc2 because it fix an annoying git clone bug
not sure how for debug sample and scalars ....
But theorically, with the above, one should be able to fully reproduce a run
if you want to replace MLflow by ClearML: do it !! It's like "Should I use sandal or running shoes for my next marathon ..."
Let your user try ClearML, and I am pretty sure all of them will want to swap over !!!