AgitatedDove14 , we are also in the same boat. We tried Kedro and found the organizational aspect to be really clean and would love to stick to it. We also like how each node of the pipeline are independent re-usable blocks.
ClearML is definitely more comprehensive (specially the concept of Tasks, Data and agents) and has its special place in our project. Now, we are trying to figure out how to run our Kedro pipelines in ClearML.
After playing with both for few days, we still cant wrap our heads around integrating both.
We have a following simple use case right now:
- Download images from S3 bucket
- Do pre-processing on the images (Add new objects in foreground etc)
- Run existing ML models on these images to generate annotations (.txt files)
- Override labels for some of the images (since we know what kind of images are these)
- Create the required directory structure for https://github.com/ultralytics/yolov5
- Start the training ( python train.py )
Each of these steps, [2], [3], [4], [5 & 6]
can be thought of as an independent Kedro nodes that can be reused in the future. Now, how to integrate this with ClearML is unclear to us.
What we tried so far:
We found that someone in this community has already tried this. We took https://github.com/noklam/allegro_test/ and added Task.init()
to each of the nodes. https://github.com/noklam/allegro_test/blob/main/src/allegro_test/pipelines/data_engineering/nodes.py#L41 .
We also added Task.execute_remotely()
so that this node will not be executed immediately.
Then, we added one Task.init()
to the https://github.com/noklam/allegro_test/blob/main/src/allegro_test/pipelines/data_engineering/pipeline.py#L39 also.
However, running kedro run
later did not run the pipeline and we did not get the logs in ClearML UI.
Even if we fix the logging issue, we are not confident if the design approach is the right now.
We also have our doubts, whether each small independent node should actually be Task.init()
?
Any help would be greatly appreciated!
TL;DR: We are confused how to incorporate the "Authoring pipelines" goal of Kedro (which we really like) into ClearML.