what a turn of events 😉 so lets summarize again:
upkeep script - for each task, find out if there are several models created by it with the same name if so, make some log so that devops can erase files DESTRUCTIVELY delete all the models from the trains-server that are in DRAFT mode, except the last one
if you want something that could work in either case, then maybe the second option is better
can we delete the models and then upload it again?
in the above image if u see 3rd entry and 7th entry their model and task are the same
same name == same path, assuming no upload is taking place? *just making sure
this is ok but in the path if they have changed the model then
so what I am describing is exactly this - once you try to create an output model from the same task, if the name already exists - do not create a new model, just update the timestamp on the old one
https://github.com/allegroai/trains/issues/193 for future reference (I will update later)
but there will be duplicate entries in the UI
fine. Can I open a feature request on our github for you, refering this conversation?
I will dig around to see how all of this could be accomplished.
Right now I see it done in two ways:
a function that you must remember to call each time that would do the upkeep a script that you can run once in a while to do cleanupwhich one would you prefer that I will pursue?
with upload I would strongly recommend against doing this
then your devops can delete the data and then delete the models pointing to that data
script runs, tries to register 4 models, each one of them is exactly found in the path, size/timestamp is different. then it will update the old 4 models with the new details and erase all the other fields
and should it work across your workspace, i.e. does not matter if task id changed? just always keep a single model with the same filename? i'm really worried this could break a lot of the reproducible/repeateable flow for your process.
isn't it better if it just updates the timestamp on the old models?