Reputation
Badges 1
15 × Eureka!It is fixed with a single task workflow (abort then enqueue), but within a pipeline with retry_on_failure
I have the same offset (that appear after each fail on my scalars). Yes we have clearml==1.11.0
done here: None , thanks in advance for your help
Yes I am saying that not matter if we set_initial_iteration(0)
and also continue_last_iteration=0
on the task init, if I requeue the task the last iteration is no reset and I still have a gap in my scalars
Let me know if you need more information of my env/worklow or need a dedicated issue
Hello @<1523701087100473344:profile|SuccessfulKoala55> , just try it and still no success, same error as before Error executing job with overrides: []
even if inside Configuration/Hyperparameters/Args overrides is correctly fill
Yes it is reproducible do you want a snippet? How do you patch tensorboard plots to decide iterations
and where does it uses last_iteration
?
Thank you very much, it works perfectly with the rc!
Hello, thanks. Do not hesitate to tag me on the PR my github username is MaximeChurin
. Once I have tested it I will let you know
yes no problem, i will try to explain it correctly but do not hesitate to complete or ask for more information
Hi !
I do not see it in the repo, am I missing something?
yes let me know if it is enough because I needed to remove a lot of internal stuff (logs are associated to snippet above):
Not sure but kinf of. I also want to clone the pipeline and change the hyperparameters but not from one specific task more like globally on my pipeline but same conclusion as him ( it is running on default parameters, given when pipeline was 1st run
), I am using from_functions
.
Not clear for me what you mean by running the cloned pipeline with an agent (maybe the answer is inside my snippet above)?
Hello,
I had again the same problem but within a remote pipeline setup. The task launching the pipeline has continue_last_task=0
but I guess this argument is not shared to the node/step that it will launched because when the retry_on_failure
of add_function_step
is triggered we start to see again the offset inside the scalars of the node. Is there a way to inherit arg of the base task or something like that?
Hello @<1523701205467926528:profile|AgitatedDove14> , do you have any update or ETA on this rc?
I am using:
ClearML SDK Version: clearml==1.12.0
ClearML Server Version (Only for self hosted): 1.9.2-317
Thank you Eugen, I tested your recommandations and now it works!
Have a nice day :hugging_face: