Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Am Using Clearml Pro And Pretty Regularly I Will Restart An Experiment And Nothing Will Get Logged To Clearml. It Shows The Experiment Running (For Days) And It'S Running Fine On The Pc But No Scalers Or Debug Samples Are Shown. How Do We Troubleshoot T

I am using ClearML Pro and pretty regularly I will restart an experiment and nothing will get logged to ClearML. It shows the experiment running (for days) and it's running fine on the PC but no scalers or debug samples are shown.
How do we troubleshoot this?

  
  
Posted 8 months ago
Votes Newest

Answers 69


Thanks ThankfulClams64 having a code that can reproduce it is exactly what we need.
One thing I might have missed and is very important , what is your tensorboard package version?

  
  
Posted 8 months ago

Yes it is logging to the console. The script does hang whenever it completes all the epochs when it is having the issue.

  
  
Posted 8 months ago

Then we also connect two dictionaries for configs

    task.connect(model_config)
    task.connect(DataAugConfig)
  
  
Posted 8 months ago

Console logs

  
  
Posted 8 months ago

sometimes I get no scalars, but the console logging always seems to be working

  
  
Posted 8 months ago

I just created a new virtual environment and the problem persists. There are only two dependencies clearml and tensorflow. CostlyOstrich36 what logs are you referring to?

  
  
Posted 8 months ago

ThankfulClams64 , are logs showing up without issue on the 'problematic' machine?

  
  
Posted 8 months ago

I do have uncommitted code changes. I can try to check at some point if it would not have the problem without them. It seems like it could be repeated just by making a git repo with that script and adding a very large file. If I can repeat it is it best to open an issue in GitHub?

  
  
Posted 8 months ago

Is there someway to kill all connections of a machine to the ClearML server this does seem to be related to restarting a task / running a new task quickly after a task fails or is aborted

  
  
Posted 8 months ago
55K Views
69 Answers
8 months ago
8 months ago
Tags