We are planning to use Airflow as an extension of clearml itself, for several tasks:
we want to isolate the data validation steps from the general training pipeline; the validation will be handled using some base logic and some more advanced validations using something like great expectations. our training data will be a snapshot from the most recent 2 weeks, and this training data will be used across multiple tasks to automate the scheduling and execution of training pipelines periodically executing the ETL pipelines for writing training logs to a more query-friendly DWH using the DWH to automate the decision making on which model to deploy to production post-live deployment: airflow will run the ETL for the feedback data we collect. this ETL includes a validation step to measure data drift, model performance once live etc