
Reputation
Badges 1
662 × Eureka!Yes exactly that AgitatedDove14
Testing our logic maps correctly, etc for everything related to ClearML
i.e. It does not process tasks on its own?
There used to be a good example but it's now missing. I'm not sure what does Use only for automation (externally), otherwise use Task.connect_configuration
mean when e.g. looking at Task.set_configuration_object
, etc.
Could you clarify a bit, CostlyOstrich36 or AgitatedDove14 ?
That's up and running and is perfectly fine.
-ish, still debugging some weird stuff. Sometimes ClearML picks ip
and sometimes ip2
, and I can't tell why 🤔
StorageManager.download_folder(remote_url='
s3://some_ip:9000/clearml/my_folder_of_interest ', local_folder='./')
yields a new folder structure, ./clearml/my_folder_of_interest
, rather than just ./my_folder_of_interest
Would be nice if the second one was a toggle-able feature (either per use or in the server settings) maybe?
CostlyOstrich36 I'm not sure what you mean by "through the apps", but any script AFAICS would expose the values of these environment variables; or what am I missing?
For now this is okay - no data lost, really - but I'd like to make sure we're not missing any steps in the next upgrade
I think you're interested in the Monitor
class:)
Okay so the only missing thing of the puzzle I think is that it would be nice if this propagates to the autoscaler as well; that then also allows hiding some of the credentials etc 😮
I think now there's the following:
Resource type Queue (name) defines resource + max instancesAnd I'm looking for:
Resource type "pool" of resources (type + max instances) A pool can be shared among queues
Thanks CostlyOstrich36 !
And can I make sure the same budget applies to two different queues?
So that for example, an autoscaler would have a resource budget of 6 instances, and it would listen to aws
and default
as needed?
Is there a way to accomplish this right now FrothyDog40 ? 🤔
Sure CostlyOstrich36 , sorry it took me so long to reply. I minimized the window a bit here so everything will fill in nicely. Worth mentioning this happens on all pages of course, but I went to the profile page so you can also see the clearml server version.
Yeah that works too. So one can override the queue ID but not the worker 🤔
It's okay 🙂 I was originally hoping to delete my "initializer" task, but I'll just archive it if someone is interested in the worker data etc. Setting the queue is quite nice.
I think this should get my team excited enough 😄
Honestly, this is all related to issue #340. The only reason we have this to begin with is because we need one separate "initializer" task that downloads the remote cache and prepares the agent environment for execution (downloading the configuration files, etc).
Otherwise it fits perfectly with pipelines, but we're not there yet.
In the local execution we don't have this initializer task, so we use Task.init()
before starting to work on a model, and task.close()
when we're done....
I didn't mention code in #340 nor did I mention data here 😄 The idea was to package non git-specific files for remote execution
That could work, given that:
Could we add a preview section? One reason I don't like using the configuration section is that it makes debugging much much harder. Will the clearml-agent download and unzip the files, placing them into the same local folder as needed for execution? What if we want to include non-configuration objects? (i.e. the model case I listed)
If everything is managed with a git repo, does this also mean PRs will have a messy metadata file attached to them?
QuaintPelican38 did you have a workaround for this then? Some cleanup service or similar?
Can I query where the worker is running (IP)?
Where do I import this APIClient from AgitatedDove14 ? I meanwhile edited it directly in mongo, but editing a db directly on a Friday is a big nono
This happened again 🤔
How many files does ClearML touch? :shocked_face_with_exploding_head:
TimelyPenguin76 CostlyOstrich36 It seems a lot of manual configurations is required to get the EC2 instances up and running.
Would it not make sense to update the autoscaler (and example script) so that the config.yaml
that's used for the autoscaler service is implicitly copied to the EC2 services, and then any extra_clearml_conf
are used/overwritten?
This seems to be fine for now, if any future lookups finds this thread, btwwith mock.patch('clearml.datasets.dataset.Dataset.create'): ...
As the meme goes, well yes but actually no, since the input path is provided via argparse? I'm also not sure how this would help debug from the WebUI - you can't really see the contents of a zipped file/the configuration tab is too messy for such a nested configuration as the one we have. It's best suited as an artifact.
EDIT: Or am I missing something? Point being, when the remote execution begins, the entry point tries to run e.g. python train.py --config_file path/to/local/file.yaml
...