"using your method you may not reach the best set of hyperparameters."
Of course you are right. It is an efficiency trade-off of speed vs effectiveness. Whether this is worth it or not depends on the use-case. Here it is worth it, because the performance of the modelling is not sensitive to the parameter we search for first. Being in the ball-park is enough. And, for the second set of parameters, we need to do a full grid search (the parameters are booleans and strings); thus, this would drive the cost regarding repetition high.
cleanly split codebase into components with clear responsibilities
I agree and it was my first instinct as well. However, I am not sure this type of separation of concerns should be done on the level of ClearML if speed is a consideration. ClearML has quite a bit of overhead cost (in terms of runtime) for each pipeline component. I have looked into Kedro for implementing separation of concerns, but I am not yet sure how to combine Kedro with ClearML yet, as there is no official support from either of the other.
What do you think?
@<1537605940121964544:profile|EnthusiasticShrimp49> : The biggest advantage I see to split your code into pipeline components is caching. A little bit structuring your code, but I was told by the staff this should not one's main aim with ClearML components. What is your main take away for splitting your code into components?
My HPO on top of the pipeline is already working 🙂 I am currently experimenting on using the HPO in a (other) pipeline that creates two HPO steps (from the same function!) to first optimize in one direction of the parameter space and then in the others; the reason for this is to save time, because a full search would take forever.
Hey @<1523704157695905792:profile|VivaciousBadger56> , I was playing around with the Pipelines a while ago, and managed to create one where I have a few steps in the begining creating and ClearML datasets like users_dataset
, sessions_dataset
, prefferences_dataset
, then I have a step which combines all 3, then an independent data quality step which runs in parallel with the model training. Also, if you want to have some fun, you can try to parametrize your pipelines and run HPO on an entire pipeline.
Sounds interesting. But my main concern with this kind of approach is if the surface of the (hparam1, hparam2, objective_fn_score)
is non-convex, using your method you may not reach the best set of hyperparameters. Maybe try using smarter search algorithms, like BOHB or TPE if you have a large search space, otherwise, you can try to do a few rounds of manual random search, reducing the search space around the region of most-likely best hyperparameters after every round.
As for why structure your code using pipelines, I come from a somewhat heavy software engineering background, so for me a cleanly split codebase into components with clear responsibilities is the best thing, and caching is just a nice addition 🙂