It is fixed with a single task workflow (abort then enqueue), but within a pipeline with retry_on_failure
I have the same offset (that appear after each fail on my scalars). Yes we have clearml==1.11.0
I have the same offset (that appear after each fail on my scalars).
Hmm, I actually would think this is the "correct" behavior, but I see your point:
Any chance you can open a GH issue ?
I had again the same problem but within a remote pipeline setup.
Are you saying the ussue is not fixed? can you verify the pipeline & pipeline components are using the at least 1.104rc0 version?
Yes I am saying that not matter if we set_initial_iteration(0)
and also continue_last_iteration=0
on the task init, if I requeue the task the last iteration is no reset and I still have a gap in my scalars
Let me know if you need more information of my env/worklow or need a dedicated issue
Yes it is reproducible do you want a snippet?
Already fixed 🙂 please ping tomorrow, I think an RC should be out soon with the fix
Hello, thanks. Do not hesitate to tag me on the PR my github username is MaximeChurin
. Once I have tested it I will let you know
Hi !
I do not see it in the repo, am I missing something?
done here: None , thanks in advance for your help
Yes it is reproducible do you want a snippet? How do you patch tensorboard plots to decide iterations
and where does it uses last_iteration
?
Hello @<1523701205467926528:profile|AgitatedDove14> , do you have any update or ETA on this rc?
Thank you very much, it works perfectly with the rc!
Hi @<1558986821491232768:profile|FunnyAlligator17>
What do you mean by?
We are able to
set_initial_iteration
to 0 but not
get_last_iteration
.
Are you saying that if your code looks like:
Task.set_initial_iteration(0)
task = Task.init(...)
and you abort and re-enqueue, you still have a gap in the scalars ?
Hello,
I had again the same problem but within a remote pipeline setup. The task launching the pipeline has continue_last_task=0
but I guess this argument is not shared to the node/step that it will launched because when the retry_on_failure
of add_function_step
is triggered we start to see again the offset inside the scalars of the node. Is there a way to inherit arg of the base task or something like that?
last iteration is no reset and I still have a gap in my scalars
Hmm is this reproducible ? can you check with the latest clearml version (1.10.3) ?
btw: I'm assuming continue_last_task=0
I think I found the issue, the fact the agent is launching it causes it to ignore the "overridden" set_initial_iteration
yes no problem, i will try to explain it correctly but do not hesitate to complete or ask for more information
Hi @<1558986821491232768:profile|FunnyAlligator17> , apologies, I think v1.10.4rc0 which is already out contains this fix...