ClearML FAQ

Answered

Hi AgitatedDove14 , I upgraded clearml from 0.17.4 to 0.17.5rc2 and the change broke my code as it seems like clearml has started using multiprocessing. I get the following error
File "/opt/conda/lib/python3.8/site-packages/clearml-0.17.5rc2-py3.8.egg/clearml/task.py", line 593, in init BackgroundMonitor.start_all(task=task) File "/opt/conda/lib/python3.8/site-packages/clearml-0.17.5rc2-py3.8.egg/clearml/utilities/process/mp.py", line 209, in start_all BackgroundMonitor._main_process.start() File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 118, in start assert not _current_process._config.get('daemon'), \ AssertionError: daemonic processes are not allowed to have childrenSince I am using multiprocessing myself to distribute training jobs, when clearml tries to use multiprocessing, I run into the above error. Things worked fine with 0.17.4. Can you elaborate where is multiprocessing getting used in clearml? I cannot remove multiprocessing from my process, so I would need to think about how to resolve this issue.

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Votes Newest

Answers 22

This is happening manually. I am not using agent yet

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Sure, it will revert to the old behavior and run in threads

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

👍

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks for the tip with the config file. I have reverted back to 0.17.4 but will try this.

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

2. interesting error, maybe we can revert to "thread mode" if running under a daemon. (I have to admit, I'm not sure why python has this limitation, let me check it...)

Yes, I'm not sure either. I have banged my head against the wall in trying to have multiple level of subprocesses, but it gets too complicated with python. Let me know what you find out

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Okay, I was able to reproduce, this will only happen if you are running from a daemon process (like in the case of a process pool), Python is sometimes very picky when it comes to multi-threading/processes I'll check what we can do 🙂

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

clearml launches a subprocess

correct, this subprocess is used fgor resource monitoring and sending logs in the background (i.e metrics console etc.)
Where does the "training" part coming from? I'm assuming the training is your main code?
Follow up, is this happening when running manually or when executed via the agent ?

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yes it does. I'm assuming each job is launched using a multiprocessing.Pool (which translates into a sub process). Let me see if I can reproduce this behavior.

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yes, I am using multiprocessing.Pool to launch each job

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Haha.. ok, good to know

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Wait but that will skip all the assertion checks that I have in my code?!

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Haha.. that would be a problem then!

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

I'll check what we can do on running in a daemon subprocess

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

SarcasticSparrow10 LOL there is a hack around it 🙂
Run your code with python -O
Which basically skips over all assertion checks

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi SarcasticSparrow10 , so yes it does, this is more efficient when using pytorch loaders, and in some other situations.
To disable it add to your clearml.conf:sdk.development.report_use_subprocess = false2. interesting error, maybe we can revert to "thread mode" if running under a daemon. (I have to admit, I'm not sure why python has this limitation, let me check it...)

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

The second subprocess is by design. It becomes the primary process when clearml does not use multiprocessing. I hope I'm not confusing you further

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Yes 😞

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Yes, I am using Pool. Here is what I think is happening. clearml launches a subprocess which I assume is a daemonic process. That process in-turn launches a subprocess for training which causes the error I mentioned

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

SarcasticSparrow10 how do I reproduce it?
I tried launching from a sub process that is a daemon and it worked. Are you using ProcessPool ?

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yes the 'training' is my main code. You can think of it has launching a job (training or inference). My main code launches multiple jobs using multiprocessing. Each job is a seprate task for clearml that gets logged. Does that make sense?

  				
Posted 
	4 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Yep, but a funny hack nonetheless.
No idea why they have it there...

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

22 Answers

4 years ago

2 years ago