I guess I follow these steps on a GCP instance?
https://clear.ml/docs/latest/docs/clearml_agent
thanks, so I got clearml-task working, sent to a queue and the agent on gcp picked it up. I had a question — for a job that runs on the order of minutes, it’s not worth re-creating the whole python virtual env from scratch on the remote (that itself takes 5mins). So is the --folder
` option meant for running it in an existing folder in an existing virtual env?
AgitatedDove14 thanks yes I assume I would follow these instructions:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_gcp
... indicate the job needs to be run remotely? I’m imagining something like
clearml-task
and you need to specify the queue to push your Task into.
See here: https://clear.ml/docs/latest/docs/apps/clearml_task
(and a way to specify which remote server)
Actually, no. This is ti spin the clearml-server on GCP, not the agent
I think I am missing one part — which command do I use on my local machine, to indicate the job needs to be run remotely? I’m imagining something likeclearml-remote run python3 my_train.py
So if I want to train with a remote agent on a remote machine, I have to:
spin up clearml-agent on the remote create a dataset using clearml-data, populate with data… from my local machine use clearml-data to upload data to google gs:// bucket modify my code so it accesses data from the dataset as here https://clear.ml/docs/latest/docs/clearml_data/clearml_data_sdk#accessing-datasetsAm I understanding right?
Yes, which looks like a lot, but you only need to d that once.
Auto scheduler would make (1) redundant (as it would spin the instance up/down based on the jobs in the queue)
HurtWoodpecker30 currently in the open source only AWS is supported, I know the SaaS pro version supports it (I'm assuming enterprise as well).
You can however manually spin an instance on GCP and launch an agent on the instance (like you would on any machine)