Hi @<1773158059758063616:profile|PanickyParrot17> ,
You can do that with ClearML pipelines . step 1 will be pulling the data, step 2 will store the data, step 3 will create the dataset.
The pipeline controller can have the parameters of which data to pull, the name of the created dataset and more, so you can run it again from the UI and just change the data source
All the steps can run with the clearml agent, and you can also specify using only cpu in the machines (search for the --cpu-only param)
thanks, ill have a read through the docs again. im just trying to think through the dev and staging setup to develop this efficiently