Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Um, Is There A Way To Delete An Artifact From A Task That Is Running?

Um, is there a way to delete an artifact from a task that is running?

  
  
Posted 2 years ago
Votes Newest

Answers 25


VexedCat68

. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.

Are you uploading the checkpoints manually with artifacts? or is it autologged & uploaded ?
Also why no reuse and overwrite older checkpoints ?

  
  
Posted 2 years ago

the storage is basically the machine the clearml server is on, not using s3 or anything

  
  
Posted 2 years ago

Currently every 2000 iterations, a checkpoint is saved, that's just part of the code. Since output_uri = True, it gets uploaded to the ClearML server.

  
  
Posted 2 years ago

How do I go about uploading those registered artifacts, would I just pass artifacts[i] and the name for the artifact?

  
  
Posted 2 years ago

Hmmmm I couldn't find something in the SDK, however, you can use the API to do it

  
  
Posted 2 years ago

I ran a training code from a github repo. It saves checkpoints every 2000 iterations. Only problem is I'm training it for 3200 epochs and there's more than 37000 iterations in each epoch. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.

  
  
Posted 2 years ago

VexedCat68
delete the uploaded file, or the artifact from the Task ?

  
  
Posted 2 years ago

Will using Model.remove, completely delete from storage as well? (edited)

correct see argument delete_weights_file=True

  
  
Posted 2 years ago

VexedCat68 , I was about to mention it myself. Maybe only keeping last few or last best checkpoints would be best in this case. I think SDK also supports this quite well 🙂

  
  
Posted 2 years ago

Also I need to modify the code to only keep the N best checkpoints as artifacts and remove others.

  
  
Posted 2 years ago

shouldn't checkpoints be uploaded immediately, that's the purpose of checkpointing isn't it?

  
  
Posted 2 years ago

can you point me to where I should look?

  
  
Posted 2 years ago

Given a situation where I want delete an uploaded artifact from both the UI and the storage, how would I go about doing that?

  
  
Posted 2 years ago

And given that I want have artifacts = task.get_registered_artifacts()

  
  
Posted 2 years ago

Since that is an immediate concern for me as well.

  
  
Posted 2 years ago

AgitatedDove14 CostlyOstrich36 I think that is the approach that'll work for me. I just need to be able to remove checkpoints I don't need given I know their name, from the UI and Storage.

  
  
Posted 2 years ago

I plan to append the checkpoint to a list, when the len(list) > N, I'll just pop out the one with the highest loss, and delete that file from clearml and storage. That's how I plan to work with it.

  
  
Posted 2 years ago

basically don't want the storage to be filled up on the ClearML Server machine.

  
  
Posted 2 years ago

Hmm, you can delete the artifact with:
task._delete_artifacts(artifact_names=['my_artifact']However this will not delete the file itself.
Do delete the file I would do :
remote_file = task.artifacts['delete_me'].url h = StorageHelper.get(remote_file) h.delete(remote_file) task._delete_artifacts(artifact_names=['delete_me']Maybe we should have a proper interface for that? wdyt? what's the actual use case?

  
  
Posted 2 years ago

VexedCat68 the remote checkpoints (i.e. Models) represent the local storage, so if you internally overwrite the files, this is exactly what will happen in the backend. so the following should work (and store the last 5 checkpoints):
epochs += 1 torch.save("model_{}.pt",format(epochs % 5))Regrading deleting / getting models:
Model.remove(task.models['output'][-1])

  
  
Posted 2 years ago

AgitatedDove14 Alright I think I understand, changes made in storage will be visible in the front end directly.

Will using Model.remove, completely delete from storage as well?

  
  
Posted 2 years ago

I need to both remove the artifact from the UI and the storage.

  
  
Posted 2 years ago

Is there a difference? I mean my use case is pretty simple. I have a training and it basically creates a lot of checkpoints. I just want to keep the n best checkpoints and whenever there are more than N checkpoints, I'll delete the worst performing one. Deleted both locally and from the the task artifacts.

  
  
Posted 2 years ago

I think it depends on your implementation. How are you currently implementing top X checkpoints logic?

  
  
Posted 2 years ago
932 Views
25 Answers
2 years ago
one year ago
Tags
Similar posts