Anyone Having Problems With Clearml Slowing Down Pytorch Experiments? Auto_Connect_Framework={“Pytorch”: False} Helps, But It’S Not A Great Solution. We Think It’S Related To Clearml Trying To Do Something At Each Dataloader Iteration. We’Ll Try To Provid

Answered

anyone having problems with ClearML slowing down pytorch experiments? auto_connect_framework={“pytorch”: False} helps, but it’s not a great solution. we think it’s related to ClearML trying to do something at each dataloader iteration. we’ll try to provide more details after we profile our script, but maybe it’s a known problem?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DilapidatedParrot58
				
					0
					 × 1

Votes Newest

Answers 16

The repo detection (I assume git?) uses the git command, so .ignore should be taken into account, I think

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

we’re using latest ClearML server and client version (1.2.0)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DilapidatedParrot58
				
					0
					 × 1

Well, I'll take a look and get back to you 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hey, looks like we found something. Actually the parameter which 'controls' slowing down is detect_repository . We think that it may be caused by lots of files in repo (data folder). Do you use .gitignore file when detecting repo?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PungentJellyfish55
				
					0
					 × 1

can you fix this?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PungentJellyfish55
				
					0
					 × 1

SuccessfulKoala55 sorry for the bump, what's the status of the fix?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					RotundHedgehog76
				
					0
					 × 1

DilapidatedDucks58 , We have a hunch we know what's wrong (we think we treat loading data like loading model and then we register each file \ files pickle as a model which takes time). How are you loading data? Is monai built inside pytorch? Or are you downloading it and loading manually? If you can share the loading code that might be helpful 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

or it should be fixed in pigar repo first?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PungentJellyfish55
				
					0
					 × 1

it’s a pretty standard pytorch train/eval loop, using pytorch dataloader and https://docs.monai.io/en/stable/_modules/monai/data/dataset.html

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DilapidatedParrot58
				
					0
					 × 1

suppose, clearml dows not take .gitignore into account
https://github.com/allegroai/clearml/blob/a47f127679ebf5912690f7c3e60791a2daa5c984/clearml/backend_interface/task/repo/scriptinfo.py#L47

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PungentJellyfish55
				
					0
					 × 1

stack trace
project_import_modules, reqs.py:46 extract_reqs, __main__.py:67 get_requirements, scriptinfo.py:49 _update_repository, task.py:298 _create_dev_task, task.py:2819 init, task.py:504 train, train_loop.py:41 <module>, train.py:88

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PungentJellyfish55
				
					0
					 × 1

Some info on the script (pseudo-code?) will be appreciated 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Oh, I see. It's actually the pigar embedded in clearml

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

That's fine 🙂 we haven't got to it yet, I'm afraid - I think the best way is to open a GitHub issue...

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

In any case, there's a 10sec timeout for this process, and you can simply choose not to do the detection

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I found place, where hang up happens

https://github.com/allegroai/clearml/blob/a47f127679ebf5912690f7c3e60791a2daa5c984/clearml/utilities/pigar/reqs.py#L47

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					PungentJellyfish55
				
					0
					 × 1

Write your answer

1K Views

16 Answers

2 years ago

one year ago