AgitatedDove14 The keys are there, and there is no specifically defined user in .gitmodules
:[submodule "xxx"] path = xxx url =
I believe this has to do with how ClearML sets up the git credentials perhaps?
Indeed. I'll open an issue, sure!
Yes, exactly. I have not yet had a chance to try this out -- should it work?
We have a read-only user with personal access token for these things, works seamlessly throughout and in our current on premise servers... So perhaps something missing in the autoscaler definitions?
Sounds like a nice idea 😁
Follow-up; any ideas how to avoid PEP 517 with the auto scaler? 🤔 Takes a long time to build the wheels
I guess I'll have to rerun the experiment without tags for this?
CostlyOstrich36 I'm not sure what is holding it from spinning down. Unfortunately I was not around when this happened. Maybe it was AWS taking a while to terminate, or maybe it was just taking a while to register in the autoscaler.
The logs looked like this:
- Recognizing an idle worker and spinning down.
2022-09-19 12:27:33,197 - clearml.auto_scaler - INFO - Spin down instance cloud id 'i-058730639c72f91e1'
2. Recognizing a new task is available, but the worker is still idle.
` 2022-09...
I cannot, the instance is long gone... But it's not different to any other scaled instances, it seems it just took a while to register in ClearML
Would be nice if the second one was a toggle-able feature (either per use or in the server settings) maybe?
Nope, no .netrc
defined anywhere, really (+I've abandoned the use of docker for the autoscaler as it complicates things, at least for now)
task.upload_artifact(..., is_requirement=True)
, task.connect_configuration(..., is_requirement=True)
Just implies these artifacts/configurations must be downloaded prior to running the code itself; then you also don't have to worry about zipping? 🤔
The new task is not running inside a new subprocess. Our platform trains several models, and we'd like each of them to be tracked in their own Task
. When running locally, this is "out of the box", as we can init and close before and after each model.
When running remotely, one cannot close the main task (since it is what orchestrates everything), and so this workaround was needed.
It's given as the second form you suggested in the mini config ( http://${...}:8080
). The quotation marks are added later by pyhocon.
I believe that happens natively thanks to pyhocon? No idea why it fails on mac
Debugging. It's very useful for us to be able to see the contents of the configuration and understand what is going on and what is meant to be going on. Without a preview (which in our case is the entire content of the configuration file), one has to take an annoying route of downloading the files etc. The configurations are uploaded to a single task and then linked across all task to conserve storage space (so the S3 storage point is identical across tasks) Sure, sounds good. I think it's a ...
Honestly, this is all related to issue #340. The only reason we have this to begin with is because we need one separate "initializer" task that downloads the remote cache and prepares the agent environment for execution (downloading the configuration files, etc).
Otherwise it fits perfectly with pipelines, but we're not there yet.
In the local execution we don't have this initializer task, so we use Task.init()
before starting to work on a model, and task.close()
when we're done....
Most of these are configurations (specific for an execution, but one such configuration defines multiple tasks). Some models might be uploaded if the user does not use our built-in link to ClearML model fetching 😄
It's okay 🙂 I was originally hoping to delete my "initializer" task, but I'll just archive it if someone is interested in the worker data etc. Setting the queue is quite nice.
I think this should get my team excited enough 😄
Yeah that works too. So one can override the queue ID but not the worker 🤔
We just inherit from logging.Handler
and use that in our logging.config.dictConfig
; weird thing is that it still logs most of the tasks, just not the last one?
What do you mean 😄 Using logging.config.dictConfig(...)
For now we've monkey-patched it to our usecase:
` Dataset._Dataset__hidden_tag = "active"
def foo(cls, dataset_project, dataset_name):
dataset_project = dataset_project or "Datasets"
return dataset_project, dataset_project.rpartition("/")[0]
Dataset._build_hidden_project_name = foo `
I’ll give the create_function_task
one more try 🤔
Thanks for the reply @<1523701827080556544:profile|JuicyFox94> ! I'll debug more and let you know
Yes, that one shows up. I forgot to mention we also set the version explicitly, but that just creates a duplicate dataset under Datasets
and anyway our main Task
is now hidden from the original project.
So project project
exists, but it is empty.
After the task was initialized? 🤔
I mean, it makes sense to have it in a time-series plot when one is logging iterations and such. But that's not always the case... Anyway I opened an issue about that too! 🙂