Reputation
Badges 1
662 × Eureka!You don't even need to set the CLEARML_WORKER_ID, it will automatically assign one based on the machine's name
I have seen this quite frequently as well tbh!
I think now there's the following:
Resource type Queue (name) defines resource + max instancesAnd I'm looking for:
Resource type "pool" of resources (type + max instances) A pool can be shared among queues
Yeah I will probably end up archiving them for the time being (or deleting if possible?).
Otherwise (regarding the code question), I think itβs better if we continue the original thread, as it has a sample code snippet to illustrate what Iβm trying to do.
Same result π This is frustrating, wtf happened :shocked_face_with_exploding_head:
This is also specifically the services queue worker I'm trying to debug π€
Internally yes, but in Task.init the default argument is a boolean, not an int.
We don't want to close the task, but we have a remote task that spawns more tasks. With this change, subsequent calls to Task.init fail because it goes in the deferred init clause and fails on validate_defaults .
We have a mini default config (if you remember from a previous discussion we had) that actually uses the second form you suggested.
I wrote a small "fixup" script that combines this default with the one generated by clearml-init , and it simply does:def_config = ConfigFactory.parse_file(DEF_CLEARML_CONF, resolve=False) new_config = ConfigFactory.parse_file(new_config_file, resolve=False) updated_new_config = ConfigTree.merge_configs(new_config, def_config)
It's of course not an MLOps issue so I understand it's not high on the priority list, but would be kinda cool to just have a simple view presenting the content of users.get_all π
Thanks CostlyOstrich36 !
Okay, I'll test it out by trying to downgrade to 4.0.0 and then upgrade to 4.1.2
Just to make sure, the chart_ref is allegroai/clearml right? (for some reason we had clearml/clearml and it seems like it previously worked?)
Nothing I can spot --
ClearML results page:
ClearML pipeline page:
Launching the next 2 steps
Launching step [...]
Launching step [...]
Launching step: ...
Parameters:
{...}
Configurations:
{}
Overrides:
{}
Launching step: ...
Parameters:
{...}
Configurations:
{}
Overrides:
{}
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
2023-02-21 13:53:48
ClearML Monitor: Could not detect iteration reporting, falling back to itera...
TimelyPenguin76 I added pip install --update clearml-agent to the extra_vm_bash_script for the autoscaler, that should at least guarantee the latest clearml agent is used on the instance, right?
Ah I see, if the pipeline controller begins in a Task it does not add the tags to itβ¦
I think this is about maybe the credential.helper used
Hm, just a small update - I just verified and it does indeed work on linux:
` import clearml
import dotenv
if name == "main":
dotenv.load_dotenv()
config = clearml.backend_api.Config.load() # Success, parsed with environment variables `
Maybe this is part of the paid version, but would be cool if each user (in the web UI) could define their own secrets, and a task could then be assigned to some user and use those secrets during boot?
Thanks AgitatedDove14 , I'll first have to prove viability with the free version :)
That's what I found as well, but it did not like it after all (boto is fine with it, but underlying urllib and requests were not?)
It's fine -- I see the added benefit in making sure the users set up their clearml.conf and I've made a script to edit it to our needs as part of the installation process π Thanks Martin!
StorageManager.download_folder(remote_url=' s3://some_ip:9000/clearml/my_folder_of_interest ', local_folder='./') yields a new folder structure, ./clearml/my_folder_of_interest , rather than just ./my_folder_of_interest
The screenshot is small since the data is private anyway, but it's enough to see:
"Metric: untitled 00" "plot image" as the image title The attached histogram has a title ("histogram of ...")
The logs are on the bucket, yes.
The default file server is also set to s3://ip:9000/clearml
Yes that's what I thought, thanks for confirming.
I'm not sure about the intended use of connect_configuration now.
I was under the assumption that in connect_configuration(configuration, name=None, description=None) , the configuration is only used in local execution.
But when I run config = task.connect_configuration({}, name='General') (in remote execution), the configuration is set to the empty dictionary
Here's an example where poetry.lock is removed, and still the console reads:url: .... branch: HEAD commit: 22fffaf8d5f377b7f10140e642a7f6f26b72ffaa root: /.../.clearml/venvs-builds/3.10/task_repository/... Applying uncommitted changes Poetry Enabled: Ignoring requested python packages, using repository poetry lock file! Creating virtualenv ds-platform in /.../.clearml/venvs-builds/3.10/task_repository/.../.venv Updating dependencies Resolving dependencies...
I'll give it a shot. Honestly, the SDK documentation for both InputModel and OutputModel is (sorry) horrible ...
Can't wait for the documentation revamping.
Yes, Iβve found that too (as mentioned, Iβm familiar with the repository). My issue is still that there is documentation as to what this actually offers.
Is this simply a helm chart to run an agent on a single pod? Does it scale in any way? Basically - is it a simple agent (similiar to on-premise agents, running in the background, but here on K8s), or is it a more advanced one that offers scaling features? What is it intended for, and how does it work?
The official documentation are very spa...
minio was a tiny bit of headache to configure, but I'd be happy to help if you want CrookedWalrus33 , I just went through this process yesterday and today (see a few threads up...)