Reputation
Badges 1
662 × Eureka!It's given as the second form you suggested in the mini config ( http://${...}:8080 ). The quotation marks are added later by pyhocon.
This could be relevant SuccessfulKoala55 ; might entail some serious bug in ClearML multiprocessing too - https://stackoverflow.com/questions/45665991/multiprocessing-returns-too-many-open-files-but-using-with-as-fixes-it-wh
Heh, well, John wrote that in the first reply in this thread ๐
And in Task.init main documentation page (nowhere near the code), it says the following -
Any follow up thoughts SuccessfulKoala55 or CostlyOstrich36 ?
FWIW Itโs also listed in other places @<1523704157695905792:profile|VivaciousBadger56> , e.g. None says:
In order to make sure we also automatically upload the model snapshot (instead of saving its local path), we need to pass a storage location for the model files to be uploaded to.
For example, upload all snapshots to an S3 bucketโฆ
Thanks Alon. In the full/official documentation the clearml-data CLI is not mentioned anywhere, so perhaps it should be refreshed ๐
I think we're referring to different things here.
I won't be using the UI (and neither will my team).
But as mentioned, we've used DVC before and it adds a lot of junk metadata files to each GitHub PR (many dvc.yaml , dvc.lock and .gitignore files). We're trying to avoid that as much as possible, hence my question about GitHub pull...
Thanks! To clarify, all the agent does is then spawn new nodes to cover the tasks?
Is Task.create the way to go here? ๐ค
I can also do this via Mongo directly, but I was hoping to skip the K8S interaction there.
I wouldn't mind going the requests route if I could find the API end point from the SDK?
Hmmm, what ๐
Hey @<1537605940121964544:profile|EnthusiasticShrimp49> ! Youโre mostly correct. The Step classes will be predefined (of course developers are encouraged to add/modify as needed), but as in the DataTransformationStep , there may be user-defined functions specified. Thatโs not a problem though, I can provide these functions with the helper_functions argument.
- The
.add_function_stepis indeed a failing point. I canโt really create a task from the notebook because calling `Ta...
The tl;dr is that some of our users like poetry and others prefer pip . Since pip install git+.... stores the git data, it seems trivial to first try and install based on pip , and only later on poetry , since the pip would crash with poetry as it stores git data elsewhere (in poetry.lock )
I guess it's mixed. If #340 is resolved, then this initializer task will be a no-op: detach, and init-close new tasks as needed.
It's pulled from the remote repository, my best guess is that the uncommitted changes apply only after the environment is set up?
The network is configured correctly ๐ But the newly spun up instances need to be set to the same VPC/Subnet somehow
You don't even need to set the CLEARML_WORKER_ID, it will automatically assign one based on the machine's name
I have seen this quite frequently as well tbh!
I think now there's the following:
Resource type Queue (name) defines resource + max instancesAnd I'm looking for:
Resource type "pool" of resources (type + max instances) A pool can be shared among queues
Yeah I will probably end up archiving them for the time being (or deleting if possible?).
Otherwise (regarding the code question), I think itโs better if we continue the original thread, as it has a sample code snippet to illustrate what Iโm trying to do.
Same result ๐ This is frustrating, wtf happened :shocked_face_with_exploding_head:
This is also specifically the services queue worker I'm trying to debug ๐ค
Internally yes, but in Task.init the default argument is a boolean, not an int.
We don't want to close the task, but we have a remote task that spawns more tasks. With this change, subsequent calls to Task.init fail because it goes in the deferred init clause and fails on validate_defaults .
We have a mini default config (if you remember from a previous discussion we had) that actually uses the second form you suggested.
I wrote a small "fixup" script that combines this default with the one generated by clearml-init , and it simply does:def_config = ConfigFactory.parse_file(DEF_CLEARML_CONF, resolve=False) new_config = ConfigFactory.parse_file(new_config_file, resolve=False) updated_new_config = ConfigTree.merge_configs(new_config, def_config)
It's of course not an MLOps issue so I understand it's not high on the priority list, but would be kinda cool to just have a simple view presenting the content of users.get_all ๐
Thanks CostlyOstrich36 !
Okay, I'll test it out by trying to downgrade to 4.0.0 and then upgrade to 4.1.2
Just to make sure, the chart_ref is allegroai/clearml right? (for some reason we had clearml/clearml and it seems like it previously worked?)
Nothing I can spot --
ClearML results page:
ClearML pipeline page:
Launching the next 2 steps
Launching step [...]
Launching step [...]
Launching step: ...
Parameters:
{...}
Configurations:
{}
Overrides:
{}
Launching step: ...
Parameters:
{...}
Configurations:
{}
Overrides:
{}
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
2023-02-21 13:53:48
ClearML Monitor: Could not detect iteration reporting, falling back to itera...
TimelyPenguin76 I added pip install --update clearml-agent to the extra_vm_bash_script for the autoscaler, that should at least guarantee the latest clearml agent is used on the instance, right?