In summary:
Spin down the local server
Backup the data folder
In the cloud, extract the data backup
Spin up the cloud server
my code looks like this :
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--config-file', type=str, default='train_config.yaml',
help='train config file')
parser.add_argument('-t', '--train-times', type=int, default=1,
help='train the same model several times')
parser.add_argument('--dataset_dir', help='path to folder containing the preped dataset.', required=True)
parser.add_argument('--backup', action='s...
Sounds like your docker image is missing some package. This is un-related to clearml.
AS for what package is missing, see here
What exactly are you trying to achieve ?
Let assume that you have Task.init() in run.py
And run.py
is inside /foo/bar/
If you run :
cd /foo
python bar/run.py
Then the Task will have working folder /foo
If you run:
cd /foo/bar
python run.py
Then your task will have the working folder /foo/bar
the underlying code has this assumption when writing it
That means that you want to make things work not in a standard Python way ... In which case you need to do "non-standard" things to make it work.
You can do this for example in the beginning of your run.py
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
In this way, you not relying on a non-standard feature to be implemented by your tool like pycharm
or `cle...
We need to focus first on Why is it taking minutes to reach Using env.
In our case, we have a container that have all packages installed straight in the system, no venv in the container. Thus we don't use CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
But then when a task is pulled, I can see all the steps like git clone, a bunch of Requirement already satisfied
.... There may be some odd package that need to be installed because one of our DS is experimenting ... But all that we can see what is...
something like this: None ?
I think ES use a greedy strategy where it allocate first then use it from there ...
@<1523701868901961728:profile|ReassuredTiger98> I found that you an set the file_server
in your local clearml.conf
to your own cloud storage. In our case, we use something like this in our clearml.conf:
api {
file_server: "azure://<account>..../container"
}
All non artifact model are then store in our azure storage. In our self-hosted clearml setup, we don't even have a file server running alltogether
If you are using multi storage place, I don't see any other choice than putting multi credential in the conf file ... Free or Paid Clearml Server ...
the config that I mention above are the clearml.conf for each agent
but afaik this only works locally and not if you run your task on a clearml-agent!
Isn;t the agent using the same clearml.conf ?
We have our agent running task and uploading everything to Cloud. As I said, we don;t even have file server running
no. I set apo.file_server to the None in Both the remote agent clearml.conf and my local clearml.conf
In which case, both case where the code is ran from local or remote, will store metrics to cloud storage
right, in which case you want to dynamically change with your code, not with the config file. This is where the Logger.set_default_output_upload come in
Just a +1 here. When we use the same name for 3 differents image, the thumbnail show 3 different images, but when clicking on any of them, only one is displayed. No way to display the others
you an use a docker image that already have those packages and dependencies, then have clearml-agent running inside or launching the docker container
python library don't always use OS certificates ... typically, we have to set REQUESTS_CA_BUNDLE=/path/to/custom_ca_bundle_crt
because requests
ignore OS certificates
I saw that page ... but nothing about number of worker of a queue .... or did I miss it ?
or which worker is in a queue ...
got it
Thanks @<1523701070390366208:profile|CostlyOstrich36>
nice !! That is exactly what I am looking for !!
Thanks @<1523701087100473344:profile|SuccessfulKoala55> I missed that one.
I have been playing with exporting task, modifying the "diff" part and importing back as new task. Seems to work as desired. But set_script
seems cleaner.
Love how flexible is ClearML !!!
you can either:
- Build an image from your docker file and when running the task/experiment, tell it to use that docker image
- If the steps to install dependencies required for your repository is not too complicate, then you can use
agent.extra_docker_shell_script
in theclearml.conf
in order to install all the dependencies inside the docker container launched by clearml in docker mode.
meanwhile, the SDK support CLEARML_CONFIG_FILE=/path
Not sure what is your use case, but if you want it to be dynamic, you can on-the-fly create the config file to /tmp
for example and point to that in your code with
import os
os.environ['CLEARML_CONFIG_FILE']="/path"
import clearml
note: you will need to set the env var very early, before the first import clearml
in your code
May be create a Feature request on github ?
yup, you have the flexibility and option, that what so nice with ClearML
please share your .service
content too as there are a lot of way to "spawn" in systemd