Reputation
Badges 1
662 × Eureka!This could be relevant SuccessfulKoala55 ; might entail some serious bug in ClearML multiprocessing too - https://stackoverflow.com/questions/45665991/multiprocessing-returns-too-many-open-files-but-using-with-as-fixes-it-wh
Is there a way to specify that flag within the config file, SuccessfulKoala55 ?
We're not using the docker setup though. The CLI run by the autoscaler is python -m clearml_agent --config-file /root/clearml.conf daemon --queue aws_small
, so no docker
Sure SuccessfulKoala55 , and thanks for looking into it.
As an alternative (for now, or in general), we could consider reverting back to pip. The issue we encounter is that we have a monorepo, so frozen requirements should specify relative paths, but pip freeze
does not seem to do that, so ClearML also fails in pip
mode
Should this be under the clearml
or clearml-agent
repo?
I'm not too worried about the dataset appearing (or not) in the Datasets
tab. I would like it (the original task ) to to not disappear from the original project I assigned it to
Yes and no SmugDolphin23
The project is listed, but there is no content and it hides my main task that it is attached to.
Not necessarily on the same branch, no
Another side effect btw is that some of our log files (we add a file handler to the logger) end up at 0 bytes. This specifically happens with Ray and ClearML and does not reproduce locally
Great to hear @<1523701087100473344:profile|SuccessfulKoala55> ! Is there an estimated timeline for these releases?
Follow-up question/feature request (out of interest) - could the WebUI show the matching commit message?
From the traceback ( backend_interface/task/task.py, line 178, in __init__
), notice it's not Task.init
I believe that a Pipeline should have the system tags ( pipeline
, maybe hidden
), even if it created in a running Task
.
I'm not entirely sure I understand the flow but I'll give it a go. I have two final questions:
This seems to only work for a single file (weights_path implies a single file, not multiple ones). Is that the case? Why do you see this as preferred to the dataset method we have now? π€
I'm trying to build an easy SDK that would fit DS work and fit the concept of clearml pipelines.
In doing so, I'm planning to define various Step
classes, that the user can then experiment with, providing Steps as input to other steps, etc.
Then I'd like for the user to be able to run any such step, either locally or remotely. Locally is trivial. Remotely is the issue. I understand I'll need to upload additional data to the remote instance, and pull a specific artifact back to the notebo...
I guess in theory I could write a run_step.py
, similarly to how the pipeline in ClearML worksβ¦ π€ And then use Task.create()
etc?
I think I may have brought this up multiple times in different ways :D
When dealing with long and complicated configurations (whether config objects, yaml, or otherwise), it's often useful to break them down into relevant chunks (think hydra, maybe).
In our case, we have a custom YAML instruction !include
, i.e.
` # foo.yaml
bar: baz
bar.yaml
obj: !include foo.yaml
maybe_another_obj: !include foo.yaml `
In which repo?:)
I'm not sure, I'm not getting anything (this is the only thing I could fin that's weird about this project).
It has a space in the name, has no subprojects, and it just doesn't show up anywhere π€
Any updates @<1523701087100473344:profile|SuccessfulKoala55> ? π
We just do task.close() and then start a new task.Init() manually, so our "pipelines" are self-controlled
TimelyPenguin76 here's the full log (took a moment to anonynomize completely):
`
Using environment access key CLEARML_API_ACCESS_KEY=xxx
Using environment secret key CLEARML_API_SECRET_KEY=********
Current configuration (clearml_agent v1.3.0, location: /tmp/.clearml_agent.zs4e7egs.cfg):
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.m...
PricklyRaven28 That would be my fallback, it would make development much slower (having to build containers with every small change)
Since this is a single process, most of these are only needed once when our "initializer" task starts and loads.
Yeah that works too. So one can override the queue ID but not the worker π€
Not that I recall
I think this is about maybe the credential.helper
used