Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, We Have A Self Hosted Clearml Server Connected To Different Queues And Use It To Launch Remote Experiments (Clearml==1.9.3, Clearml-Agent==1.5.2Rc0). It Is Working Really Well For Us Unless One Workflow :) We Would Like To Abort An Experiment And E

Hello,
We have a self hosted ClearML server connected to different queues and use it to launch remote experiments (clearml==1.9.3, clearml-agent==1.5.2rc0). It is working really well for us unless one workflow :)
We would like to abort an experiment and enqueue it in another queue with a lower priority using the interface (continue same task). We use Tensorflow and tensorboard with the keras compile/fit wrapper. The tensorboard plots look fine after enqueue (restart at the last complete epochs/iterations) thanks to inital_epochs but clearml patch over these functions create an offset like mentioned in this issue . We are able to set_initial_iteration to 0 but not get_last_iteration . As I understand it we could in the task init with continue_last_task=0 but I do not get how I could set that during enqueue trigger by the interface or there is another solution ?

  
  
Posted one year ago
Votes Newest

Answers 18


Yes I am saying that not matter if we set_initial_iteration(0) and also continue_last_iteration=0 on the task init, if I requeue the task the last iteration is no reset and I still have a gap in my scalars
Let me know if you need more information of my env/worklow or need a dedicated issue

  
  
Posted one year ago

I had again the same problem but within a remote pipeline setup.

Are you saying the ussue is not fixed? can you verify the pipeline & pipeline components are using the at least 1.104rc0 version?

  
  
Posted one year ago

done here: None , thanks in advance for your help

  
  
Posted one year ago

Woot woot, will do!

  
  
Posted one year ago

Yes it is reproducible do you want a snippet? How do you patch tensorboard plots to decide iterations and where does it uses last_iteration ?

  
  
Posted one year ago

Hello @<1523701205467926528:profile|AgitatedDove14> , do you have any update or ETA on this rc?

  
  
Posted one year ago

Hi !
I do not see it in the repo, am I missing something?

  
  
Posted one year ago

Yes it is reproducible do you want a snippet?

Already fixed 🙂 please ping tomorrow, I think an RC should be out soon with the fix

  
  
Posted one year ago

It's available in pypi: None

  
  
Posted one year ago

It is fixed with a single task workflow (abort then enqueue), but within a pipeline with retry_on_failure I have the same offset (that appear after each fail on my scalars). Yes we have clearml==1.11.0

  
  
Posted one year ago

yes no problem, i will try to explain it correctly but do not hesitate to complete or ask for more information

  
  
Posted one year ago

Hi @<1558986821491232768:profile|FunnyAlligator17>
What do you mean by?

We are able to

set_initial_iteration

to 0 but not

get_last_iteration

.

Are you saying that if your code looks like:

Task.set_initial_iteration(0)
task = Task.init(...)

and you abort and re-enqueue, you still have a gap in the scalars ?

  
  
Posted one year ago

last iteration is no reset and I still have a gap in my scalars

Hmm is this reproducible ? can you check with the latest clearml version (1.10.3) ?
btw: I'm assuming continue_last_task=0

I think I found the issue, the fact the agent is launching it causes it to ignore the "overridden" set_initial_iteration

  
  
Posted one year ago

Hello,
I had again the same problem but within a remote pipeline setup. The task launching the pipeline has continue_last_task=0 but I guess this argument is not shared to the node/step that it will launched because when the retry_on_failure of add_function_step is triggered we start to see again the offset inside the scalars of the node. Is there a way to inherit arg of the base task or something like that?

  
  
Posted one year ago

Hi @<1558986821491232768:profile|FunnyAlligator17> , apologies, I think v1.10.4rc0 which is already out contains this fix...

  
  
Posted one year ago

I have the same offset (that appear after each fail on my scalars).

Hmm, I actually would think this is the "correct" behavior, but I see your point:
Any chance you can open a GH issue ?

  
  
Posted one year ago

Thank you very much, it works perfectly with the rc!

  
  
Posted one year ago

Hello, thanks. Do not hesitate to tag me on the PR my github username is MaximeChurin . Once I have tested it I will let you know

  
  
Posted one year ago