I don’t mean a serving endpoint, just the equivalent of “cloning an experiment” and running it on a different (larger) dataset.
If I want to retrain the same model at a certain cadence on some streaming data
Do you mean a serving endpoint?
You’re saying there’s a built-in scheduler? SuccessfulKoala55
If so where can I find it?
Hi LazyTurkey38 !
Thanks for you kind words 🙏
my understanding is that the commit hash is picked up and any diff from the remote branch will be applied. Is that correct??
Correct 🙂 - do can get some control of this process or override it, if you'd like, but that's the default behavior.
Are the only two options for setting up the right environment for a Task either docker or git+pip?
You can have your ClearML Agent run the code in docker, based on some image you choose (or a default image, or even a complete standalone, prebuilt image you can build using the Agent), or you can have your ClearML Agent run the code in a virtual python environment. In both cases (unless you use a standalone image) the task's requirements and code are installed in the execution sandbox (venv inside the docker, or just the venv) and executed there.
Do you support caching of git that evolves with the code base to speed it up?
Yes 🙂 - ClearML Agent has both a venv cache and a cvs cache, so you can get a speedup rather quickly 🙂
In fact, if there is a good python API to list/duplicate/edit/run experiments by ID, it seems straightforward to do that from Airflow (or any other job scheduler). I’m just wondering if there is some built-in scheduler.