Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, Together With

Hi,
Together with ElegantKangaroo44 we found two unexpected behaviors in task.models['output'] :
The input model of the task is included in the list The best model is not included in the listWe log models using Ignite TrainSaver (pytorch_ignite == 0.4rc0.post1), any idea?

  
  
Posted 4 years ago
Votes Newest

Answers 30


"Updates a few seconds ago"

That just means that the process is not dead.

Yes that seemed to be stuck 😞
Any chance you can verify with the RC version?
I'll try to dig into the commits, maybe I can come up with an explanation ...

  
  
Posted 4 years ago

BTW:
Just making sure, 74 was not supposed to be the last checkpoint (in other words it is not stuck on leaving the training process, but actually in the middle)

  
  
Posted 4 years ago

Yes, it is supposed to run for 200 epochs

  
  
Posted 4 years ago

Which commit corresponds to RC version? So far we tested with latest commit on master (9a7850b23d2b0e1f2098ab051de58ce806143fff)

  
  
Posted 4 years ago

(It would be nice to have all the Pypi releases tagged in github btw)

  
  
Posted 4 years ago

(It would be nice to have all the Pypi releases tagged in github btw)

I wanted to say, we listen ... and point to the tag , but for some reason it was not pushed LOL.

  
  
Posted 4 years ago

Alright, I will try with that one

  
  
Posted 4 years ago

JitteryCoyote63 while it's running, could you give me a few details on the setup, maybe I can reproduce it.
Is it using pytorch distributed ?
Are all models uploaded to S3 ?
etc.

  
  
Posted 4 years ago

Not using pytorch distributed, all models are uploaded to s3 yes

  
  
Posted 4 years ago

using trains RC, trains-agent 0.15.0

  
  
Posted 4 years ago

Alright, experiment finished properly (all models uploaded). I will restart it to check again, but seems like the bug was introduced after that

  
  
Posted 4 years ago

The experiment finished completely this time again

  
  
Posted 4 years ago

The experiment finished completely this time again

With the RC version or the latest ?

  
  
Posted 4 years ago

with the RC version

  
  
Posted 4 years ago

I was unable to reproduce, but I added a few safety checks. I'll make sure they are available on the master in a few minutes, could maybe rerun after?

  
  
Posted 4 years ago

Sure 🙂

  
  
Posted 4 years ago

I just tested the master with https://github.com/jkhenning/ignite/blob/fix_trains_checkpoint_n_saved/examples/contrib/mnist/mnist_with_trains_logger.py on the latest ignite master and Trains, it passed, but so did the previous commit...

  
  
Posted 4 years ago

JitteryCoyote63 fix pushed to master, let me know if it passes...

  
  
Posted 4 years ago

Thanks! Will test now

  
  
Posted 4 years ago

JitteryCoyote63 How is it so far ?

  
  
Posted 4 years ago

Seems to works, I started a last one to confirm!

  
  
Posted 4 years ago

To be honest, I'm not sure I have a good explanation on why ... (unless on some scenarios an exception was thrown and caught silently and caused it)

  
  
Posted 4 years ago

I started a last one to confirm!

You mean a second run, just to make sure ?

  
  
Posted 4 years ago

Exactly

  
  
Posted 4 years ago

JitteryCoyote63 passed ?

  
  
Posted 4 years ago

Just checked, it did pass, training finished and all 200 models saved 🙂

  
  
Posted 4 years ago

I'm happy to hear! 😅

  
  
Posted 4 years ago

And thanks again, I really appreciate testing it!

  
  
Posted 4 years ago

Thanks for the quick responses and support too! 🙂

  
  
Posted 4 years ago
1K Views
30 Answers
4 years ago
one year ago
Tags