Reputation
Badges 1
662 × Eureka!I'll have some reports tomorrow I hope TimelyPenguin76 SuccessfulKoala55 !
Haha, I've opened so many issues these past few days... Sure, np!
I couldn't find it directly in the SDK at least (in the APIClient)... 🤔
-ish, still debugging some weird stuff. Sometimes ClearML picks ip
and sometimes ip2
, and I can't tell why 🤔
Example configuration -
` version: 1
disable_existing_loggers: true
formatters:
simple:
format: '%(asctime)s %(levelname)-9s %(name)-24s: %(message)s'
filters:
brackets:
(): ccutils.logger.BracketFilter
handlers:
console:
class: ccmlp.utils.TqdmStreamHandler
level: INFO
formatter: simple
filters: [brackets]
loggers: # Set logging levels for specific packages
urllib3:
level: WARNING
matplotlib:
level: WARNING
...
I thought so too - so I added flush calls just in case, but nothing's changed.
This is somewhat weird since it always happens in the above scenario (Ray + ClearML), and always in the last task/job from Ray
SuccessfulKoala55 could this be related to the monkey patching for logging platform? We have our own logging handlers that we use in this case
Hi AgitatedDove14 !
Ah, thanks! I'll use the artifacts for linking.
We've forgone the "use current task" already because it indeed made things even more difficult (the task that was used is then automatically hidden by this automatic renaming of dataset tasks).
The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).
It's a small snippet that ensures identically named projects are still unique'd with a running number.
Right so it uses whatever version is available on the agent.
Yeah it would be nice to have either a poetry_version
(a-la https://github.com/allegroai/clearml-agent/blob/5afb604e3d53d3f09dd6de81fe0a494dacb2e94d/docs/clearml.conf#L62 ), rename the latter to manager_version
, or just install from the captured environment, etc? 🤔
But there's nothing of that sort happening. The process where it's failing is on getting tasks for a project.
Sure! It looks like this
I'm running tests with pytest
, it consumes/owns the stream
It is. Let me see what else I have set up for MinIO in configs, one moment
Yes, as I wrote above 😄
I also tried switching to dockerized mode now, getting the same issue 🤔
I think the environment variables path might work for you then?
You'd set your config withuse_credentials_chain: ${CREDENTIALS_CHAIN}
Then in Python you could os.environ['CREDENTIALS_CHAIN'] = True/False
before you make any calls to ClearML?
So a normal config file with environment variables.
I think now there's the following:
Resource type Queue (name) defines resource + max instancesAnd I'm looking for:
Resource type "pool" of resources (type + max instances) A pool can be shared among queues
Another side effect btw is that some of our log files (we add a file handler to the logger) end up at 0 bytes. This specifically happens with Ray and ClearML and does not reproduce locally
Right, but that's as defined in the services agent, which is not immediately transparent
can I assume these files are reused
A definite maybe, they may or may not be used, but we'd like to keep that option 🙃
Maybe the "old" way Dataset were shown is better suited ?
It was, but then it's gone now 😞
I see your point, this actually might be a "bug"?!
I would say so myself, but could be also by design..?
Awesome, I'll ask Product to reach out
LMK, happy to help out!
I know our use case is maybe a very different one, but...
This could be relevant SuccessfulKoala55 ; might entail some serious bug in ClearML multiprocessing too - https://stackoverflow.com/questions/45665991/multiprocessing-returns-too-many-open-files-but-using-with-as-fixes-it-wh