Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, We Have A Self Hosted Clearml Server Connected To Different Queues And Use It To Launch Remote Experiments (Clearml==1.9.3, Clearml-Agent==1.5.2Rc0). It Is Working Really Well For Us Unless One Workflow :) We Would Like To Abort An Experiment And E

Hello,
We have a self hosted ClearML server connected to different queues and use it to launch remote experiments (clearml==1.9.3, clearml-agent==1.5.2rc0). It is working really well for us unless one workflow :)
We would like to abort an experiment and enqueue it in another queue with a lower priority using the interface (continue same task). We use Tensorflow and tensorboard with the keras compile/fit wrapper. The tensorboard plots look fine after enqueue (restart at the last complete epochs/iterations) thanks to inital_epochs but clearml patch over these functions create an offset like mentioned in this issue . We are able to set_initial_iteration to 0 but not get_last_iteration . As I understand it we could in the task init with continue_last_task=0 but I do not get how I could set that during enqueue trigger by the interface or there is another solution ?

  
  
Posted one year ago
Votes Newest

Answers 18


Hello @<1523701205467926528:profile|AgitatedDove14> , do you have any update or ETA on this rc?

  
  
Posted 12 months ago

It's available in pypi: None

  
  
Posted 11 months ago

Hi @<1558986821491232768:profile|FunnyAlligator17> , apologies, I think v1.10.4rc0 which is already out contains this fix...

  
  
Posted 11 months ago

Hi !
I do not see it in the repo, am I missing something?

  
  
Posted 11 months ago

Thank you very much, it works perfectly with the rc!

  
  
Posted 11 months ago

Hi @<1558986821491232768:profile|FunnyAlligator17>
What do you mean by?

We are able to

set_initial_iteration

to 0 but not

get_last_iteration

.

Are you saying that if your code looks like:

Task.set_initial_iteration(0)
task = Task.init(...)

and you abort and re-enqueue, you still have a gap in the scalars ?

  
  
Posted one year ago

Yes I am saying that not matter if we set_initial_iteration(0) and also continue_last_iteration=0 on the task init, if I requeue the task the last iteration is no reset and I still have a gap in my scalars
Let me know if you need more information of my env/worklow or need a dedicated issue

  
  
Posted one year ago

last iteration is no reset and I still have a gap in my scalars

Hmm is this reproducible ? can you check with the latest clearml version (1.10.3) ?
btw: I'm assuming continue_last_task=0

I think I found the issue, the fact the agent is launching it causes it to ignore the "overridden" set_initial_iteration

  
  
Posted one year ago

Yes it is reproducible do you want a snippet?

Already fixed 🙂 please ping tomorrow, I think an RC should be out soon with the fix

  
  
Posted one year ago

Yes it is reproducible do you want a snippet? How do you patch tensorboard plots to decide iterations and where does it uses last_iteration ?

  
  
Posted one year ago

Hello, thanks. Do not hesitate to tag me on the PR my github username is MaximeChurin . Once I have tested it I will let you know

  
  
Posted one year ago

Woot woot, will do!

  
  
Posted one year ago

I had again the same problem but within a remote pipeline setup.

Are you saying the ussue is not fixed? can you verify the pipeline & pipeline components are using the at least 1.104rc0 version?

  
  
Posted 10 months ago

It is fixed with a single task workflow (abort then enqueue), but within a pipeline with retry_on_failure I have the same offset (that appear after each fail on my scalars). Yes we have clearml==1.11.0

  
  
Posted 10 months ago

I have the same offset (that appear after each fail on my scalars).

Hmm, I actually would think this is the "correct" behavior, but I see your point:
Any chance you can open a GH issue ?

  
  
Posted 10 months ago

yes no problem, i will try to explain it correctly but do not hesitate to complete or ask for more information

  
  
Posted 10 months ago

done here: None , thanks in advance for your help

  
  
Posted 10 months ago

Hello,
I had again the same problem but within a remote pipeline setup. The task launching the pipeline has continue_last_task=0 but I guess this argument is not shared to the node/step that it will launched because when the retry_on_failure of add_function_step is triggered we start to see again the offset inside the scalars of the node. Is there a way to inherit arg of the base task or something like that?

  
  
Posted 10 months ago
636 Views
18 Answers
one year ago
10 months ago
Tags
Similar posts