Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

Hi guys, does anybody have the same issue like me? Is there any workaround? https://github.com/allegroai/clearml/issues/762

  
  
Posted 2 years ago
Votes Newest

Answers 12


The question is — are there any workarounds to set last iteration to correct value. And preferably do it in a simple way (i.e. not setting it manually).

  
  
Posted 2 years ago

I tried it, but unfortunately, this way it only sets last iteration to 0 instead of using last iteration from TensorBoard and simply rewrites logs. Expected behaviour is that it reads last iteration correctly. At least it is stated in docs so.

  
  
Posted 2 years ago

Thanks Martin. I tried to rerun everything from scratch using continue_last_task=0 and looks like it helped a lot but not completely. You can see in attached screenshot that gaps in iteration axis are still a little bigger than expected. I’v rerun it two times.

  
  
Posted 2 years ago

No, I don’t need last iteration set to zero. All I need is to ClearML correctly initialize it from TensorBoard (or from wherever it initializes it). When I train model, stop training and then resume it, ClearML instead of using last iteration doubles (I guess) it. And this can be seen in attached screenshot in GitHub issue.

  
  
Posted 2 years ago

VivaciousWalrus21 I took a look at your example from the github issue:
https://github.com/allegroai/clearml/issues/762#issuecomment-1237353476
It seems to do exactly what you expect. and stores its own last iteration as part of the checkpoint. When running the example with continue_last_task=int(0) you get exactly what you expect
(Do notice that TB visualizes these graphs in a very odd way, and it took me a few clicks to verify it...)

  
  
Posted 2 years ago

Oh sorry, from the docstring, this will work:
` :param bool continue_last_task: Continue the execution of a previously executed Task (experiment)

.. note::
    When continuing the executing of a previously executed Task,
    all previous artifacts / models/ logs are intact.
    New logs will continue iteration/step based on the previous-execution maximum iteration value.
    For example:
    The last train/loss scalar reported was iteration 100, the next report will be iteration 101.

The values are:

- ``True`` - Continue the last Task ID.
    specified explicitly by reuse_last_task_id or implicitly with the same logic as reuse_last_task_id
- ``False`` - Overwrite the execution of previous Task  (default).
- A string - You can also specify a Task ID (string) to be continued.
    This is equivalent to `continue_last_task=True` and `reuse_last_task_id=a_task_id_string`.
- An integer - Specify initial iteration offset (override the auto automatic last_iteration_offset)
    Pass 0, to disable the automatic last_iteration_offset or specify a different initial offset
    You can specify a Task ID to be used with `reuse_last_task_id='task_id_here'` `

Notice we are actually setting the last iteration manually at initialization time, should do the trick
task = Task.init(project_name='OCR/CRNN', task_type='training', task_name='CRNN from scratch', reuse_last_task_id=True, continue_last_task=int(0))

  
  
Posted 2 years ago

Hi VivaciousWalrus21

After restarting training huge gaps appear in iteration axis (see the screenshot).

The Task.init actually tries to understand what was the last reported interation and continue from that iteration, I'm assuming that what happens is that your code does that also, which creates a "double shift" that you see as the jump. I think the next version will try to be "smarter" about it, and detect this double gap.
In the meantime, you can do:
task = Task.init(...) task.set_initial_iteration(0)wdyt?

  
  
Posted 2 years ago

Hi Martin, thanks for the response! Nope, setting initial iteration didn’t solve the problem.

  
  
Posted 2 years ago

My pleasure 🙂

  
  
Posted 2 years ago

Thanks a lot for your help!

  
  
Posted 2 years ago

Hi VivaciousWalrus21 I tested the sample code, and the gap was evident in Tensorboard as well. This is not clearml generating this jump this is internal (like the auto de/serialization and continue of the code base)

  
  
Posted 2 years ago

Expected behaviour is that it reads last iteration correctly. At least it is stated in docs so.

This is exactly what should happen, are you saying that for some reason it fails?

  
  
Posted 2 years ago