Pipelines! 😄
ClearML allows you to create pipelines, with each step either being created from code or from pre-existing tasks. Each task btw. can have a custom docker container assigned that it should be run inside of, so it should fit nicely with your workflow!
Youtube videos:
https://www.youtube.com/watch?v=prZ_eiv_y3c
https://www.youtube.com/watch?v=UVBk337xzZo
Relevant Documentation:
https://clear.ml/docs/latest/docs/pipelines/
Custom docker container per task:
https://clear.ml/docs/latest/docs/references/sdk/task#set_base_docker
You can also override the docker container it should use by using an override in the pipeline controller
yes, this is the use case, I think we can use smth like Redis for this communication
After re-reading your question, it might be difficult to have cross-process communication though. So if you want the preprocessing to happen at the same time as the training and the training to pull data from the preprocessing on the fly, that might be more difficult. Is this your usecase?
As long as your clearml-agents have access to the redis instance it should work! Cool usecase though, interested to see how well it would work 🙂
yeah, we've used pipelines in other scenarios. might be a good fit here. thanks!