Hi All, I'M Using Clearml 1.0.3 With Clearml-Server <1 (How Do I Get The Current Running Version?) In Pytorch-Lightning I Use Ddp And I See Multiple Tasks (As The Number Of Gpus) Being Created And Remaining In Draft Mode. Is It A Problem Running Clearml

Answered

Hi all,
I'm using clearml 1.0.3 with clearml-server <1 (how do I get the current running version?)
In Pytorch-Lightning I use DDP and I see multiple tasks (as the number of gpus) being created and remaining in draft mode.

Is it a problem running clearml > 1 with server < 1?
If not, what could be the problem?

  				
Posted 
	3 years ago

					More  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

Votes Newest

Answers 18

Hi ExcitedFish86

In Pytorch-Lightning I use DDP

I think a fix for pytorch multi-node / process distribution was commited to 1.0.4rc1, could you verify it solves the issue ? (rc1 should fix this specific issue)
BTW: no problem working with cleaml-server < 1

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I can't find it in PyPi

  				
Posted 
	3 years ago

					More  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

pip install clearml==1.0.4rc1

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

got it

  				
Posted 
	3 years ago

					More  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

still the same 😞

  				
Posted 
	3 years ago

					More  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

Task.init should be called before pytorch distribution is called, then on each instance you need to call Task.current_task() to get the instance (and make sure the logs are tracked).

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Is this you case?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

oops. I used create instead of init 😳

  				
Posted 
	3 years ago

					More  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

ohh that's why 🙂

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Maybe we should rename it?! it actually creates a Task but will not auto connect it...

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I think so. IMHO all API calls should maybe reside in a different module since they usually happen inside some control code

  				
Posted 
	3 years ago

					More  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

An easier fix for now will probably be some kind of warning to the user that a task is created but not connected

  				
Posted 
	3 years ago

					More  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

An easier fix for now will probably be some kind of warning to the user that a task is created but not connected

That is a good point, maybe if you do not have a "main" Task, then we print the warning (with some flag to disable the warning) ?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

sounds great.
BTW the code is working now out-of-the-box. Just 2 magic line - import + Task.init

  				
Posted 
	3 years ago

					More  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

LOL 😊

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ExcitedFish86 regarding the <1 version - are you talking about ClearML Server or ClearML SDK?

  				
Posted 
	3 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

server

  				
Posted 
	3 years ago

					More  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

ExcitedFish86 You came to ClearML because it's free, you stayed because of the magic 🎊 🎉

  				
Posted 
	3 years ago

					More  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

Write your answer

1K Views

18 Answers

3 years ago

2 years ago