Dear Clearml Community, I Am Looking For A Way To Properly Resume A Training In A Way That Initial Scalars Get Reused And Expanded. Clearml Feature For Reusing The Same Task Works Fine (When Using

Answered

Dear ClearML Community,
I am looking for a way to properly resume a training in a way that initial scalars get reused and expanded. ClearML feature for reusing the same Task works fine (when using continue_last_task = True and reuse_last_task_id = <my-clearml-task-id> ) and my training orchestrator automatically retrieves my latest checkpoint, that's alright!
However, I systematically notice a jump of some number of "ghost iterations" when resuming my trainings...
As you can see on the picture below, I stopped my training at 4610 iterations using ClearML "Abort" button. You can't see it from the scalar, but my checkpoint was saved at iteration 3354.
What I observe is that, strangely, the amount of iterations it took for getting to my checkpoint (i.e., 3354) corresponds to the added number of "ghost iterations" before pursuing the plot of the scalar when resuming the training.
Has anyone of you ever encountered such a skip in the number of iterations when resuming a training reusing the same preceding Task? 🤔
Thank you so much in advance for your support! 🙏

  				
Posted 
	one year ago

					More  		
  Report
		
					CrookedSeal85
				
					0
					 × 1

Votes Newest

Answers 10

Do you think such a feature exists in ClearML?

Currently this is "fixed" for iterations (which is actually just a integer monotonic value) or the time stamp.
But I cannot see any reason why we could not allow users to control the x-axis title, and to be able to set it in code, I'm assuming this is what you have in mind?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Oh I see, basically a UI feature.
I'm assuming this is not just changing the x-axis in the UI, but somehow store the x-axis as part of the scalars reported?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi AgitatedDove14 , that's a very good point!
I found issue " Possibility to choose any scalar for horizontal x-axis #1186 " opened one month ago that is pretty close to what I suggest. I will complement it with my graph screenshots to illustrate the issue!
Thank you for your recommendation 🙇

  				
Posted 
	one year ago

					More  		
  Report
		
					CrookedSeal85
				
					0
					 × 1

Hi CrookedSeal85

However, I systematically notice a jump of some number of "ghost iterations" when resuming my trainings...

Try the following:

task = Task.init(..., continue_last_task=0

from the Task.init docstring (Notice this value can be both boolean and integer)

        :param bool continue_last_task: Continue the execution of a 
...
          - An integer - Specify initial iteration offset (override the auto automatic last_iteration_offset). Pass 0, to disable the automatic last_iteration_offset or specify a different initial offset. You can specify a Task ID to be used with `reuse_last_task_id='task_id_here'`

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Just one more question, do you have any idea about how I could change the x-axis label from "Iterations" to "Epochs"

You mean in the UI (i.e. just the title) ? or are you actually reporting iterations instead of epochs? and if so is this auto connected to tensorboard or is it reported manually ?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yes, I mean in the UI, just for the title of the x-axis.
For instance, in the graphs below, I am reporting "mIoU" metric by epochs . It's ok for "time" for instance to leave the x-axis title as "Iterations", but for "mIoU", I was wondering if it would be possible to change "Iterations" to "Epochs" for clarity 🙄 .
Thank you again for your reactivity and support! 🙏

  				
Posted 
	one year ago

					More  		
  Report
		
					CrookedSeal85
				
					0
					 × 1

Yeah I think this kind of makes sense to me, any chance you can open a GH issue on this feature request?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi AgitatedDove14 , this is it yes!
A solution for freely choosing the x-axis title in the UI depending on the scalar (e.g., as in the screenshot above, but with "Epochs" instead of "Iterations" for the plot on the left 😉 ).
Do you think such a feature exists in ClearML?

  				
Posted 
	one year ago

					More  		
  Report
		
					CrookedSeal85
				
					0
					 × 1

Yes, this is that indeed 😉 , to be able to freely choose the a-axis title depending on whether we intend to log data according to iterations or epochs 😃 .

  				
Posted 
	one year ago

					More  		
  Report
		
					CrookedSeal85
				
					0
					 × 1

Hi AgitatedDove14 ,

That was exactly that! Thank you for the hint! ✅ My scalars now get pursued as they should when resuming a training from latest checkpoint! 🤩

Just one more question, do you have any idea about how I could change the x-axis label from "Iterations" to "Epochs" for some specific scalars only? I saw from " ClearML Doc > CearML Fundamentals > Logger > Types of Logged Results " that this should be effectively possible:

Scalars - Time series data. X-axis is always a sequential number, usually iterations but can be epochs or others.

I have checked ClearML code, among others the Reporter and Logger classes, but I can't find it in the code.

Thank you very much in advance for your help again! 🙏

  				
Posted 
	one year ago

					More  		
  Report
		
					CrookedSeal85
				
					0
					 × 1

Write your answer

1K Views

10 Answers

one year ago