Unanswered
Hello Everyone,
I Have A Question Regarding Model Deletion. For Removing Some "Bad" Checkpoints During Training, I Currently Simply Use `Model.Remove(Model=Model_Id)`.
Curiously, It Seems That This Function Works For Removing The Checkpoint From The File
Hello everyone,
I have a question regarding model deletion. For removing some "bad" checkpoints during training, I currently simply use Model.remove(model=model_id)
.
Curiously, it seems that this function works for removing the checkpoint from the fileserver but not from the database.
Here is an example:
- During a training of 2000 epochs on an instance with 2 GPUs, I continuously delete the second-to-last checkpoint in order to keep only 2 checkpoints: The "best" checkpoint overall as well as the very last checkpoint. This means that I permanently have only 2 models under "ARTIFACTS > OUTPUT MODELS".
- At the end of my training, I effectively have only my best and last checkpoints listed under "ARTIFACTS > OUTPUT MODELS":(see picture 1)
- However when looking at the "MODELS" tab, all my checkpoints are still listed but only 2 of those, namely the last and the best checkpoints, have a valid URL leading towards an existing checkpoint on the fileserver. All other references point to an inexisting model: "Not Found - The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.".(see pictures 2 and 3)
My goal is to cleanly suppress those falsy references pointing towards "ghost models".
Does anyone know a method for completely deleting a ClearML model from both the fileserver and the database?
Thank you very much in advance for your help! 🙏
61 Views
0
Answers
8 days ago
7 days ago
Tags
Similar posts