Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Um, Is There A Way To Delete An Artifact From A Task That Is Running?

Um, is there a way to delete an artifact from a task that is running?

  
  
Posted one year ago
Votes Newest

Answers 25


VexedCat68 the remote checkpoints (i.e. Models) represent the local storage, so if you internally overwrite the files, this is exactly what will happen in the backend. so the following should work (and store the last 5 checkpoints):
epochs += 1 torch.save("model_{}.pt",format(epochs % 5))Regrading deleting / getting models:
Model.remove(task.models['output'][-1])

  
  
Posted one year ago

Is there a difference? I mean my use case is pretty simple. I have a training and it basically creates a lot of checkpoints. I just want to keep the n best checkpoints and whenever there are more than N checkpoints, I'll delete the worst performing one. Deleted both locally and from the the task artifacts.

  
  
Posted one year ago

Hmmmm I couldn't find something in the SDK, however, you can use the API to do it

  
  
Posted one year ago

VexedCat68
delete the uploaded file, or the artifact from the Task ?

  
  
Posted one year ago

Hmm, you can delete the artifact with:
task._delete_artifacts(artifact_names=['my_artifact']However this will not delete the file itself.
Do delete the file I would do :
remote_file = task.artifacts['delete_me'].url h = StorageHelper.get(remote_file) h.delete(remote_file) task._delete_artifacts(artifact_names=['delete_me']Maybe we should have a proper interface for that? wdyt? what's the actual use case?

  
  
Posted one year ago

I ran a training code from a github repo. It saves checkpoints every 2000 iterations. Only problem is I'm training it for 3200 epochs and there's more than 37000 iterations in each epoch. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.

  
  
Posted one year ago

VexedCat68 , I was about to mention it myself. Maybe only keeping last few or last best checkpoints would be best in this case. I think SDK also supports this quite well 🙂

  
  
Posted one year ago

Also I need to modify the code to only keep the N best checkpoints as artifacts and remove others.

  
  
Posted one year ago

I need to both remove the artifact from the UI and the storage.

  
  
Posted one year ago

can you point me to where I should look?

  
  
Posted one year ago

the storage is basically the machine the clearml server is on, not using s3 or anything

  
  
Posted one year ago

shouldn't checkpoints be uploaded immediately, that's the purpose of checkpointing isn't it?

  
  
Posted one year ago

Given a situation where I want delete an uploaded artifact from both the UI and the storage, how would I go about doing that?

  
  
Posted one year ago

Since that is an immediate concern for me as well.

  
  
Posted one year ago

And given that I want have artifacts = task.get_registered_artifacts()

  
  
Posted one year ago

How do I go about uploading those registered artifacts, would I just pass artifacts[i] and the name for the artifact?

  
  
Posted one year ago

I think it depends on your implementation. How are you currently implementing top X checkpoints logic?

  
  
Posted one year ago

VexedCat68

. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.

Are you uploading the checkpoints manually with artifacts? or is it autologged & uploaded ?
Also why no reuse and overwrite older checkpoints ?

  
  
Posted one year ago

Currently every 2000 iterations, a checkpoint is saved, that's just part of the code. Since output_uri = True, it gets uploaded to the ClearML server.

  
  
Posted one year ago

AgitatedDove14 CostlyOstrich36 I think that is the approach that'll work for me. I just need to be able to remove checkpoints I don't need given I know their name, from the UI and Storage.

  
  
Posted one year ago

I plan to append the checkpoint to a list, when the len(list) > N, I'll just pop out the one with the highest loss, and delete that file from clearml and storage. That's how I plan to work with it.

  
  
Posted one year ago

AgitatedDove14 Alright I think I understand, changes made in storage will be visible in the front end directly.

Will using Model.remove, completely delete from storage as well?

  
  
Posted one year ago

Will using Model.remove, completely delete from storage as well? (edited)

correct see argument delete_weights_file=True

  
  
Posted one year ago

basically don't want the storage to be filled up on the ClearML Server machine.

  
  
Posted one year ago
332 Views
25 Answers
one year ago
10 months ago
Tags
Similar posts