I Am Using Clearml Pro And Pretty Regularly I Will Restart An Experiment And Nothing Will Get Logged To Clearml. It Shows The Experiment Running (For Days) And It'S Running Fine On The Pc But No Scalers Or Debug Samples Are Shown. How Do We Troubleshoot T

Answered

I am using ClearML Pro and pretty regularly I will restart an experiment and nothing will get logged to ClearML. It shows the experiment running (for days) and it's running fine on the PC but no scalers or debug samples are shown.
How do we troubleshoot this?

  				
Posted 
	8 months ago

					More  		
  Report
		
					ThankfulClams64
				
					0
					 × 1

Votes Newest

Answers 69

When the script is hung at the end the experiment says failed in ClearML

  				
Posted 
	8 months ago

					More  		
  Report
		
					ThankfulClams64
				
					0
					 × 1

Correct, so I get something like this

ClearML Task: created new task id=6ec57dcb007545aebc4ec51eb5b34c67
======> WARNING! Git diff too large to store (2536kb), skipping uncommitted changes <======
ClearML results page:

but that is all

  				
Posted 
	8 months ago

					More  		
  Report
		
					ThankfulClams64
				
					0
					 × 1

Thanks ThankfulClams64 having a code that can reproduce it is exactly what we need.
One thing I might have missed and is very important , what is your tensorboard package version?

  				
Posted 
	8 months ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yes it is logging to the console. The script does hang whenever it completes all the epochs when it is having the issue.

  				
Posted 
	8 months ago

					More  		
  Report
		
					ThankfulClams64
				
					0
					 × 1

Then we also connect two dictionaries for configs

    task.connect(model_config)
    task.connect(DataAugConfig)

  				
Posted 
	8 months ago

					More  		
  Report
		
					ThankfulClams64
				
					0
					 × 1

Console logs

  				
Posted 
	8 months ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

sometimes I get no scalars, but the console logging always seems to be working

  				
Posted 
	8 months ago

					More  		
  Report
		
					ThankfulClams64
				
					0
					 × 1

I just created a new virtual environment and the problem persists. There are only two dependencies clearml and tensorflow. CostlyOstrich36 what logs are you referring to?

  				
Posted 
	8 months ago

					More  		
  Report
		
					ThankfulClams64
				
					0
					 × 1

ThankfulClams64 , are logs showing up without issue on the 'problematic' machine?

  				
Posted 
	8 months ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Show more results

Write your answer

55K Views

69 Answers

8 months ago