Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi

Hi AgitatedDove14 , I upgraded clearml from 0.17.4 to 0.17.5rc2 and the change broke my code as it seems like clearml has started using multiprocessing. I get the following error
File "/opt/conda/lib/python3.8/site-packages/clearml-0.17.5rc2-py3.8.egg/clearml/task.py", line 593, in init BackgroundMonitor.start_all(task=task) File "/opt/conda/lib/python3.8/site-packages/clearml-0.17.5rc2-py3.8.egg/clearml/utilities/process/mp.py", line 209, in start_all BackgroundMonitor._main_process.start() File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 118, in start assert not _current_process._config.get('daemon'), \ AssertionError: daemonic processes are not allowed to have childrenSince I am using multiprocessing myself to distribute training jobs, when clearml tries to use multiprocessing, I run into the above error. Things worked fine with 0.17.4. Can you elaborate where is multiprocessing getting used in clearml? I cannot remove multiprocessing from my process, so I would need to think about how to resolve this issue.

  
  
Posted 3 years ago
Votes Newest

Answers 22


ok

  
  
Posted 3 years ago

Yep, but a funny hack nonetheless.
No idea why they have it there...

  
  
Posted 3 years ago

I'll check what we can do on running in a daemon subprocess

  
  
Posted 3 years ago

The second subprocess is by design. It becomes the primary process when clearml does not use multiprocessing. I hope I'm not confusing you further

  
  
Posted 3 years ago

Thanks for the tip with the config file. I have reverted back to 0.17.4 but will try this.

  
  
Posted 3 years ago

SarcasticSparrow10 how do I reproduce it?
I tried launching from a sub process that is a daemon and it worked. Are you using ProcessPool ?

  
  
Posted 3 years ago

Sure, it will revert to the old behavior and run in threads

  
  
Posted 3 years ago

Hi SarcasticSparrow10 , so yes it does, this is more efficient when using pytorch loaders, and in some other situations.
To disable it add to your clearml.conf:sdk.development.report_use_subprocess = false2. interesting error, maybe we can revert to "thread mode" if running under a daemon. (I have to admit, I'm not sure why python has this limitation, let me check it...)

  
  
Posted 3 years ago

Yes, I am using Pool. Here is what I think is happening. clearml launches a subprocess which I assume is a daemonic process. That process in-turn launches a subprocess for training which causes the error I mentioned

  
  
Posted 3 years ago

2. interesting error, maybe we can revert to "thread mode" if running under a daemon. (I have to admit, I'm not sure why python has this limitation, let me check it...)

Yes, I'm not sure either. I have banged my head against the wall in trying to have multiple level of subprocesses, but it gets too complicated with python. Let me know what you find out

  
  
Posted 3 years ago

clearml launches a subprocess

correct, this subprocess is used fgor resource monitoring and sending logs in the background (i.e metrics console etc.)
Where does the "training" part coming from? I'm assuming the training is your main code?
Follow up, is this happening when running manually or when executed via the agent ?

  
  
Posted 3 years ago

Okay, I was able to reproduce, this will only happen if you are running from a daemon process (like in the case of a process pool), Python is sometimes very picky when it comes to multi-threading/processes I'll check what we can do 🙂

  
  
Posted 3 years ago

Yes the 'training' is my main code. You can think of it has launching a job (training or inference). My main code launches multiple jobs using multiprocessing. Each job is a seprate task for clearml that gets logged. Does that make sense?

  
  
Posted 3 years ago

This is happening manually. I am not using agent yet

  
  
Posted 3 years ago

Yes it does. I'm assuming each job is launched using a multiprocessing.Pool (which translates into a sub process). Let me see if I can reproduce this behavior.

  
  
Posted 3 years ago

SarcasticSparrow10 LOL there is a hack around it 🙂
Run your code with python -O
Which basically skips over all assertion checks

  
  
Posted 3 years ago

Wait but that will skip all the assertion checks that I have in my code?!

  
  
Posted 3 years ago

Yes, I am using multiprocessing.Pool to launch each job

  
  
Posted 3 years ago

Haha.. that would be a problem then!

  
  
Posted 3 years ago

Yes 😞

  
  
Posted 3 years ago

👍

  
  
Posted 3 years ago

Haha.. ok, good to know

  
  
Posted 3 years ago