
Reputation
Badges 1
149 × Eureka!@<1523701070390366208:profile|CostlyOstrich36> on a remote agent, yes, running the task from the interface
This way I would want to keep track of 3 OutputModel
s and call update_weights
3 times every update - and probably do 2 redundant uploadings
is there a some sort of OutputModel.remove
method? Docs say there isn't
if the loss is lower than the best stored loss so far, add the new checkpoint and remove the top-4th
Strictly speaking, there is only one training task, but I want to keep top-3 best checkpoints for it all the time
If I keep track of 3 OutputModels
simultaneously, the weights would need to shift between them every epoch (like, updated weights for top-1, then top-1 becomes top-2, top-2 becomes top-3 etc)
this is how I implemented it by myself. Looks like clearml functionality is quite opinionated and requires some tweaks every time I try to replace my own stuff with it
` clearml_name = os.path.basename(save_path)
output_model_best = OutputModel(
task=task,
name=clearml_name,
tags=['running-best'])
output_model_best.update_weights(
save_path,
upload_uri=params.clearml_aws_checkpoints,
target_filename=clearml_name
) `
if I just use plain boto3 to sync weights to/from S3, I just check how many files are stored in the location, and clear up the old ones
e.g. if I want to store only top-3 running best checkpoints
CostlyOstrich36 thank you for the answer! Maybe I just can delete old models along with corresponding tasks, seems to be easier
hm, not quite clear how it is implemented. For example, this is how I do it now (explicitly)
What exactly we need to copy? I believe we have already copied everything, but it keeps throwing "Fetch experiment failed" error
SparklingElephant70 then use task_overrides
argument, like thistask_overrides={'script.branch': 'main', 'script.version_num': '', 'script.diff': '', 'project': Task.get_project_id(project_name=cfg[name]['base_project']), 'name': 'task-name', 'container.image': 'registry.gitlab.com/image:tag'}
there must be some schema to change script name as well
SparklingElephant70 Try specifying full path to the script (relative to working dir)
yeah, I mean I need to get the model to get its ID, but I need to get ID to get the model
Searching by model ID is good idea, but how do I fetch it from the code? In principle, InputModels are rarely defined automatically, so I could look up for the ID manually...
AgitatedDove14 I run into this problem again. Are there any known issues about it? I don't remember what helped the last time
SparklingElephant70 in WebUI Execution/SCRIPT PATH
AgitatedDove14 thank you. Maybe you know about OutputModel.remove
method or something like that?
Is there a way to simplify it with ClearML, not make it more complicated?