LudicrousParrot69 I would advise the following:
Put all the experiments in a new project Filter based on the HPO tag, and sort the experiments based on the metric we are optimizing (see adding custom columns to the experiment table) And select + archive the experiments that are not usedBTW: I think someone already suggested we do the auto archiving inside the HPO process itself. Thoughts ?
Yeah I was imagining the artifact, id, link to the child task, etc, would all be saved out. I have the HPO experiment open in the UI at the moment, and yup, I can see in the Results>Plots a table summary, but that wasnt the issue, it was trying to clean up the project wide experiments view without making a large number of projects. Are tagging / archiving available in the API for a task? Also, thanks for the help so far š
... grab the model artifacts for each, put them into the parent HPO model as its artifacts, and then go through the archive everything.
Nice. wouldn't it make more sense to "store" a link to the "winning" experiment. So you know how to reproduce it, and the set of HP that were chosen?
No that the model is bad, but how would I know how to reproduce it, or retrain when I have more data etc..
+1 for autoarchiving. Right now the interface feels incredibly clunky to use once the number of HPO trials starts to increase. I currently have a demo project and have different algos to make predictions (a simple keras model, a RF, etc). Ideally Iād want to see the HPO execution just once with all the trials underneath it, or just the top (few) models. At the moment, I have pages and pages of models, 99% of them I dont care about. Is it possible to archive models and set tags in the code rather than the UI?
LudicrousParrot69 we are working on adding nested project which should help with the humongous mass the HPO can create. This is a more generic solution for the nesting issue. (since nesting inside a table is probably not the best UX solution š )
Ah yup, found it, I was in the server Tasks doco and not the clearml Task doco, oops!
To go off the online example, it finds the top 3 performing models and prints out the ID. What would be better would be to take those 3 IDs and - in the python code - grab the model artifacts for each, put them into the parent HPO model as its artifacts, and then go through the archive everything. Doesnt solve the issue if a HPO run is going to take a few days (in which the UI would become unusable), but once its done then the auto archiving would clean it up a lot. Is that possible at all, until nesting becomes bakes in?
Are tagging / archiving available in the API for a task?
Everything that the UI can do you can do programmatically š
Tags:
task.add_tags / set_tags / get_tags
Archive:
task.set_system_tags(task.get_system_tags() + ['archived'])
Doesnt solve the issue if a HPO run is going to take a few days
The HPO Task has a table of the top performing experiments, so when you go to the "Plot" tab you get a summary of all the runs, with the Task ID of the top performing one.
No need to run through the details of the entire experiments, just look at the summary on the HPO Task.