Hi JealousParrot68
clearml tracking of experiments run through kedro (similar to tracking with mlflow)
That's definitely very easy, I'm still not sure how Kedro scales on clusters. From what I saw, and I might have missed it, it seems more like a single instance with sub-processes, but no real ability to setup diff environment for the diff steps in the pipeline, is this correct ?
I think the challenge here is to pick the right abstraction matching. E.g. should a node in kedro (which usually is one function but can also be more involved) be equivalent to a task or should a pipeline be a task?
This actually ties well with the next version of pipelines we are working on. Basically like kubeflow add a decorator to a function making the fucntion a step in the pipeline (and a Task in ClearML).
My thinking was somehow separate short/simple steps (i.e. functions), from complicated steps (e.g. training with specific requirements).
Maybe Kedro can launch the "simple steps"? what do you think?
I am writing a small plugin for kedro/clearml atm that tries to link up kedro with clearml. Would be interesting to share experience and get input from the clearml people at some point.
YES! please share that sounds great!
Also is it good practice to reuse task_ids when running the same job twice during debugging or always create a new one.
Hmm good point, this is why you can configure the behavior in clearml.conf (or disable it altogether) , currently we assume that if not artifacts/models were used and the last time you executed the Task was under 72h ago, the Task ID will be used (assuming running from the same machine)