We are planning on using airflow as the orchestration, but that may not fit your needs. I would say that the tool of choice is highly context specific.
We will be using airflow to trigger clearml-defined pipelines based on 'trigger' events, such as degradation in model performance, error alerts (e.g. at the data transformation task) etc.
Sure, with pleasure. However, we're using a self-hosted (on premise) version of ClearML...
So like a UI for creating pipelines doing different things on the different solutions ?
Yeah within clearml , we use the PipelineController. We are now mainly looking for a single tool to stitch together other products.
But of course, will give first precedence to tools which will work best with clearml. Thus asking, if anyone has had similar experience on setting up such systems.
AgitatedDove14 We too self host (on prem) the helm charts in our local k8s ecosystem.
Triggering - Will be nice feature indeed, currently we are using clearml.monitors to address these now
Is it the UI presenting the entire workflow? - This portion will also be nice. (Let's say someone uses a 1) clearmldataset -> 2) Pipeline Controller (Contains preprocessing, training, hyperparamter tuning) -> 3) clearml-serving ).. If they can see the entire thing, in one flow
We are using seldon for other reasons, thus cant use clearml-serving but at least the first 2 components should be common for most folks.
I'm also currently in a similar process, and giving a shot to http://DAGster.io
Hi DeliciousBluewhale87
When you say "workflow orchestration", do you mean like a pipeline automation ?
Still figuring out, what is the best orchestration tool,which can run this end-2-end.
DeliciousBluewhale87 / PleasantGiraffe85 based on the scenario above what is the missing step that you need to cover? Is it the UI presenting the entire workflow? Or maybe the a start trigger that can be configured ?
AgitatedDove14 Not creating but more for orchestrating...
Currently, we manually push a dataset to cleaml-dataset .
Have a pipeline controller Task which (takes in data from clearml-dataset, runs preprocessing, runs training) and Publishes a model (if certain threshold is met).
We have clearml monitor which will monitor all Published models .It will push the uri of the published model to a rabbitmq.
We have a subscriber (python code) listening to the rabbitmq. This takes in the uri from the queue and creates a seldon deployment.
Here the vast majority of the stuff gets done in clearml. Still figuring out, what is the best orchestration tool,which can run this end-2-end.
TenseOstrich47 / PleasantGiraffe85
The next version (I think releasing today) will already contain scheduling, and the next one (probably RC right after) will include triggering. That said currently the UI wizard for both (i.e. creating the triggers), is only available in the community hosted service. That said I think that creating it from code (triggers/schedule) actually makes a lot of sense,
pipeline presented in a clear UI,
This is actually actively worked on, I think AnxiousSeal95 would love to get your feedback on such things, your opnion actual matters a lot!
For us it is both - having the process/pipeline presented in a clear UI, and the ability to trigger it, e.g. every evening.
In addition, tools like Dagster offer code-organization, and a separation of the code itself from the data and the configuration. So that we can use the same data/ml pipeline for different use-cases.