Reputation
Badges 1
41 × Eureka!So if I want to train with a remote agent on a remote machine, I have to:
spin up clearml-agent on the remote create a dataset using clearml-data, populate with data… from my local machine use clearml-data to upload data to google gs:// bucket modify my code so it accesses data from the dataset as here https://clear.ml/docs/latest/docs/clearml_data/clearml_data_sdk#accessing-datasetsAm I understanding right?
(and a way to specify which remote server)
I think I am missing one part — which command do I use on my local machine, to indicate the job needs to be run remotely? I’m imagining something likeclearml-remote run python3 my_train.py
got it, nice, thanks
I see, so there’s no way to launch a variant of my last run (with say some config/code tweaks) via CLI, and have it re-use the cached venv?
I mean it is in Pip mode and the agent installs deps from git repo that it pulls
Yes after installing , it listed the installed packages in the console , with version of each
But “cloning” via UI runs an exact copy of the code/config, not a variant, unless I edit those via UI (which is not ideal). So it looks like the following workflow that is trivial to do locally is not possible via remote agents:
run exp tweak code/configs in IDE, or tweak configs via CLI have it re-rerun in exact same venv (with no install overhead etc)
So maybe the remote agents are more meant for enqueuing a whole collection of settings (via code) and checking back in a few hours (in which ...
Actually with base-task-id
it uses the cached venv, thanks for this suggestion! Seems like this is equivalent to cloning via UI.
And I will look into the non-cli workflow you’re suggesting.
I have a strong attachment to a workflow based on CLI, nice zsh auto-suggestions, Hydra and the like. Hence why I moved away from dvc 🙂
… but I have a feeling they will not give me the “instant venv activation” behavior I’m looking for.
I usedtask.execute_remotely(queue_name=..., clone=True)
and indeed it instantly activates the venv on the remote. I assume clone=True is fine
I use a CLI arg remote=True so depending on that it will run locally or remotely.
Oh I think I know what missed. When I set --project … --name …
they did not match the names I used when I did task.init( )
in my code
Thanks for the quick response . Will look into this later , I think I understand
A quick note for others who may visit this… it looks like you have to do:Task.force_requirements_env_freeze(force=True, requirements_file="requirements.txt")
to ensure any changes in requirements.txt are reflected in the remote venv
AgitatedDove14 thanks yes I assume I would follow these instructions:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_gcp
thanks, so I got clearml-task working, sent to a queue and the agent on gcp picked it up. I had a question — for a job that runs on the order of minutes, it’s not worth re-creating the whole python virtual env from scratch on the remote (that itself takes 5mins). So is the --folder
` option meant for running it in an existing folder in an existing virtual env?
I guess I follow these steps on a GCP instance?
https://clear.ml/docs/latest/docs/clearml_agent
I would also be interested in a GCP autoscaler, I did not know it was possible/available yet.
So net-net does this mean it’s behaving as expected, or is there something I need to do enable “full venv cache”? It spends nearly 2 mins starting fromcreated virtual environment CPython3.8.10.final.0-64 in 97ms creator CPython3Posix(dest=/home/pchalasani/.clearml/venvs-builds/3.8, clear=False, global=False)
and then printing several lines lines like this
` Successfully installed pip-20.1.1
Collecting Cython
Using cached Cython-0.29.30-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86...
Thanks, I guess I need to have a bucket under Cloud Storage?