Reputation
Badges 1
662 × Eureka!I guess it's mixed. If #340 is resolved, then this initializer task will be a no-op: detach, and init-close new tasks as needed.
It's a bit hard to read when they'll all clustered together:
Let me verify a hypothesis...
So maybe summarizing (sorry for the spam):
- Pipelines:- Pros: Automatic upload and serialization of input arguments
- Cons: Clutter, does not support classes, cannot inject code, does not recognize environment when run from e.g. IPython- Tasks:- Pros: Tidier and matches original idea, recognizes environment even when run from IPython
- Cons: Does not support classes, cannot inject code, does not automatically upload input arguments
Unfortunately I can't take a photo of not being able to compare tasks by navigating around the WebUI...
Does that clarify the issue CostlyOstrich36 ?
~ is a bit weird since it's not part of the package (might as well let the user go through clearml-init ), but using ${PWD} works! 👍 👍
(Though I still had to add the CLEARML_API_HOST and CLEARML_WEB_HOST ofc, or define them in the clearml.conf)
Yes 😅 I want ClearML to load and parse the config before that. But now I'm not even sure those settings in the config are even exposed as environment variables?
Exactly, it should have auto-detected the package.
Or well, because it's not geared for tests, I'm just encountering weird shit. Just calling task.close() takes a long time
Thanks SuccessfulKoala55 ! Is this listed anywhere in the documentation?
Could I set an environment variable there and then refer to it internally in the config with the ${...} notation?
I see https://github.com/allegroai/clearml-agent/blob/d2f3614ab06be763ca145bd6e4ba50d4799a1bb2/clearml_agent/backend_config/utils.py#L23 but not where it's called 🤔
I think now there's the following:
Resource type Queue (name) defines resource + max instancesAnd I'm looking for:
Resource type "pool" of resources (type + max instances) A pool can be shared among queues
Maybe @<1523701827080556544:profile|JuicyFox94> can answer some questions then…
For example, what’s the difference between agentk8sglue.nodeSelector and agentk8sglue.basePodTemplate.nodeSelector ?
Am I correct in understanding that the former decides the node type that runs the “scaler” (listening to the given agentk8sglue.queue ), and the latter for any new booted instance/pod, that will actually run the agent and the task?
Read: The former can be kept lightweight, as it does no...
There's a specific fig[1].set_title(title) call.
Eek. Is there a way to merge a backup from elastic to current running server?
The screenshot is small since the data is private anyway, but it's enough to see:
"Metric: untitled 00" "plot image" as the image title The attached histogram has a title ("histogram of ...")
That gives us the benefit of creating "local datasets" (confined to the scope of the project, do not appear in Datasets tabs, but appear as normal tasks within the project)
That's probably in the newer ClearML server pages then, I'll have to wait still 😅
I just ran into this too recently. Are you passing these also in the extra_clearml_conf for the autoscaler?
FWIW It’s also listed in other places @<1523704157695905792:profile|VivaciousBadger56> , e.g. None says:
In order to make sure we also automatically upload the model snapshot (instead of saving its local path), we need to pass a storage location for the model files to be uploaded to.
For example, upload all snapshots to an S3 bucket…
Not sure I understand your comment - why not let the user start with an empty comparison page and add them from "Add Experiment" button as well?
Yes exactly, but I guess I could've googled for that 😅
Copy the uncommitted changes captured by ClearML using the UI, write to changes.patch , run git apply changes.patch 👍
I'm trying, let's see; our infra person is away on holidays :X Thanks! Uh, which configuration exactly would you like to see? We're running using the helm charts on K8s, so I don't think I have direct access to the agent configuration/update it separately?
From the log you shared, the task is picked up by the
worker_d1bd92a3b039400cbafc60a7a5b1e52b_4e831c4cbaf64e02925b918e9a3a1cf6_<hostname>:gpu0,1
worker
I can try and target the default one if it helps..?
Ah I see, if the pipeline controller begins in a Task it does not add the tags to it…
I’ll give the create_function_task one more try 🤔
Oh nono, more like:
- Create a pipeline
- Add N steps to it
- Run the pipeline
- It fails/succeeds, the user does something with the output
- The user would like to add/modify some steps based on the results now (after closer inspection).I wonder if at (5), do I have to recreate the pipeline every time? 🤔