What’s the general pattern for running a pipeline - train model, evaluate metrics and publish the model if satisfactory (based on a threshold, for example)
Basically I would do:
parameters for pipeline:
TaskA = Training model Task (think of it as our template Task)
Metric = title/series/sign we want to choose based on, where sign is max/min
Project = Project to compare the performance so that we could decide to publish based on the best Metric.
Pipeline:
Clone TaskA Change TaskA arguments (if needed) Launch and wait until completed Get TaskA's instance Metric value = Task.get_task(task_id='instance_id_111').get_last_scalar_metrics[Metric.title][Metric.series][Metric.sign])
5. Get all Tasks with metric above/below this one,tasks = Tasks.get_tasks(project=, name=, etc...) tasks = sorted(tasks, key=lambda x: x.get_last_scalar_metrics[Metric.title][Metric.series][Metric.sign]))
6. pick the best one# best task, if this is us, publish if tasks[-1].id == instance_id_111: tasks[-1].publish()
wdyt?
AgitatedDove14 I'm making some progress on this. I've currently got the situation that my training run saved all of these files, and Task.get_task(param['TaskA']).models['output''][-1]
gets me just one of them, training_args.bin
. Then -2
gets me another, rng_state.pth
If I just get Task.get_task(param['TaskA']).models['output']
, I end up getting a huge list of, like,
[<clearml.model.Model object at 0x7fec2841c880>, <clearml.model.Model object at 0x7fec2841c8e0>, <clearml.model.Model object at 0x7fec2841c820>...
So I think I have a solution here, which is to just loop backwards through the list until I find the right file I want to load.
But I just noticed that for some reason pytorch_model.bin isn't there. I'm not sure why that wasn't saved. huh
That's cool AgitatedDove14 , will try it out and pester you a bit more. 🙂
Very interesting, thanks! I'll look into it!
Interesting, I wasn't aware of the possibilities you outline there at the end, where you, like, programmatically pull all the results down for all the tasks. Neat!
A more complex version of this which I'm trying to figure out:
I trained a model using TaskA. I need to now pull that model down from the saved artifacts of TaskA and fine-tune it in TaskB That finetuning in TaskB spits out a metric.
Is there a way to do this all elegantly? Currently my process is to manually download the models from the UI, then manually upload them to S3, then manually pull them down from S3 and then start the code to finetune TaskB
Is there a way to do this all elegantly?
Of yes there is, this is how TaskB code will look:
` task = Task.init(..., 'task b')
param = {'TaskA' :'TaskAs ID HERE'}
task.connect(param)
taska_model = Task.get_task(param['TaskA']).models['output''][-1]
torch.load(taska_model.get_local_copy())
train
torch.save('modelb') `I might have missed something there, but generally speaking this will let you:
Select TASKA as a parameter of TaskB training process Will register automagically Tasks'A model as input model of TaskB Store TasksB in the model repositorySo basically full lineage with ability to automate. wdyt?