Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
We'Re Using Ray And Clearml Together, And Suddenly We'Re Seeing Some Hanging Threads, And Finally We Got An Error Message:

We're using Ray and ClearML together, and suddenly we're seeing some hanging threads, and finally we got an error message:

` 2022-01-10 09:58:56,803 [ERROR] [CrossValidationJob] : Failed: ray::cv_iteration() (pid=6703, ip=192.168.1.58)
File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task

<trimmed irrelevant chunks of traceback and code>

File "***/ccmlp/mlops/clearml_ops.py", line 70, in _close
self.task.flush(wait_for_uploads=True)
File "***/.venv/lib/python3.8/site-packages/clearml/task.py", line 1492, in flush
self.__reporter.wait_for_events()
File "***/.venv/lib/python3.8/site-packages/clearml/backend_interface/metrics/reporter.py", line 261, in wait_for_events
return self._report_service.wait_for_events(timeout=timeout)
File "***/.venv/lib/python3.8/site-packages/clearml/backend_interface/metrics/reporter.py", line 83, in wait_for_events
while self._thread and self._thread.is_alive() and (not timeout or time()-tic < timeout):
AttributeError: 'bool' object has no attribute 'is_alive' `

  
  
Posted 2 years ago
Votes Newest

Answers 23


Some commits related to subprocesses and thread handling 🙂

  
  
Posted 2 years ago

Another side effect btw is that some of our log files (we add a file handler to the logger) end up at 0 bytes. This specifically happens with Ray and ClearML and does not reproduce locally

  
  
Posted 2 years ago

I'll try with 1.1.5 first, then 1.1.6rc0

  
  
Posted 2 years ago

Well, the thing is ClearML also uses dictConfig, and I think you might be overriding its settings...

  
  
Posted 2 years ago

I thought so too - so I added flush calls just in case, but nothing's changed.
This is somewhat weird since it always happens in the above scenario (Ray + ClearML), and always in the last task/job from Ray

  
  
Posted 2 years ago

I believe so...

  
  
Posted 2 years ago

perhaps a flush issue?

  
  
Posted 2 years ago

What's new in 1.1.6rc0?

  
  
Posted 2 years ago

Or do you mean the contents of the configuration, probably :face_palm: ... one moment

  
  
Posted 2 years ago

Hi UnevenDolphin73 , which clearml version are you using?

  
  
Posted 2 years ago

Well, how to you set up dictConfig?

  
  
Posted 2 years ago

I'm guessing that's not on pypi yet?

  
  
Posted 2 years ago

What do you mean 😄 Using logging.config.dictConfig(...)

  
  
Posted 2 years ago

We just inherit from logging.Handler and use that in our logging.config.dictConfig ; weird thing is that it still logs most of the tasks, just not the last one?

  
  
Posted 2 years ago

Example configuration -
version: 1 disable_existing_loggers: true formatters: simple: format: '%(asctime)s %(levelname)-9s %(name)-24s: %(message)s' filters: brackets: (): ccutils.logger.BracketFilter handlers: console: class: ccmlp.utils.TqdmStreamHandler level: INFO formatter: simple filters: [brackets] loggers: # Set logging levels for specific packages urllib3: level: WARNING matplotlib: level: WARNING botocore: level: WARNING fsspec: level: WARNING s3fs: level: WARNING boto3: level: WARNING s3transfer: level: WARNING git: level: WARNING ray: level: WARNING PIL: level: WARNING root: level: DEBUG handlers: [console]

  
  
Posted 2 years ago

Can you try the latest RC? 1.1.6rc0?

  
  
Posted 2 years ago

Might very well be - do you touch other handlers?

  
  
Posted 2 years ago

1.1.4

  
  
Posted 2 years ago

SuccessfulKoala55 could this be related to the monkey patching for logging platform? We have our own logging handlers that we use in this case

  
  
Posted 2 years ago

I'll try upgrading to 1.1.5, one moment

  
  
Posted 2 years ago

I believe it is maybe a race condition that's tangent to clearml now...

  
  
Posted 2 years ago

Ah it is.

  
  
Posted 2 years ago

If that's the case, wouldn't it apply across the board? This happens in a single task within ray - the other tasks (I have many in a single run) are fine

  
  
Posted 2 years ago