Reputation
Badges 1
662 × Eureka!Oh! Nice! I'll have a go at it and report back at the PR if it's in a functional state π Thanks AgitatedDove14 !
I believe that a Pipeline should have the system tags ( pipeline
, maybe hidden
), even if it created in a running Task
.
Is there some default Docker image you ship with ClearML that you'd recommend, or can/should we use our own? π
I guess the big question is how can I transfer local environment variables to a new Task
For example, we have a complicated YAML file with built-in !include
instructions, so we upload all the included files too. This then clogs up the artifacts sidebar, and it would be nice to be able to say "these are all artifacts from this one file, you can collapse it by clicking here"
The SDK is fine as it is - I'm more looking at the WebUI at this point
The overall flow I currently have is e.g.
Start an internal task (not ClearML Task; MLOps not initialized yet) Call some pre_init
function with args
so I can upload the environment file via StorageManager to S3 Call some start_run
function with the configuration dictionary loaded, so I can upload the relevant CSV files and configuration file Finally initialize the MLOps (ClearML), start a task, execute remotely
I can play around with 3/4 (so e.g. upload CSVs and configuratio...
Sure, for example when reporting HTML files:
What's new in 1.1.6rc0?
That doesn't make sense? π€
Maybe I was not clear, but it's a simple part of the config file.
I guess it's mixed. If #340 is resolved, then this initializer task will be a no-op: detach, and init-close new tasks as needed.
Yes, thanks AgitatedDove14 ! It's just that the configuration
object passed onwards was a bit confusing.
Is there a planned documentation overhaul? π€
I mean, if I search for "model", will it automatically search for tasks containing "model" in their name?
Great, thanks! Any idea about environment variables and/or other files (CSV)? I suppose I could use the task.upload_artifact
for the CSVs. but I'm still unsure about the environment variables
Maybe. When the container spins, are there any identifiers regarding the task etc available? I create a folder on the bucket per python train.py
so that the environment variables files doesn't get overwritten if two users execute almost-simultaneously
If I set the following:"extra_clearml_conf": "sdk.aws.s3.credentials = [\n{\nhost: 'ip:9000'\nkey: 'xxx'\nsecret: 'xxx'\nmultipart: false\nsecure: false\n},\n{\nhost: 'ip2:9000'\nkey: 'xxx'\nsecret: 'xxx'\nmultipart: false\nsecure: false\n}\n]"
I run into a weird furl
error:ValueError: Invalid port '9000''.
Coming back to this; ClearML prints a lot of error messages in local tests, supposedly because the output streams are not directly available:
` --- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.10/logging/init.py", line 1103, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/home/idan/CC/git/ds-platform/.venv/lib/python3.10/site-packages/clearml/task.py", line 3504, in _at_exit
self.__shutdown...
These are per-user. Essentially we log user DB access as well (for various backtracking afterwards), so it's beneficial for us to pass the user DB secrets to the task and not have it configured once on the agent.
Opened this - https://github.com/allegroai/clearml/issues/530 let me know if it's not clear enough FrothyDog40 !
JitteryCoyote63 please do not get used to it :D there's an open ticket/feature request to either revert this or let the user/server choose the most comfortable way
Does it make sense to you to run several such glue instances, to manage multiple resource requirements?
Yes, Iβve found that too (as mentioned, Iβm familiar with the repository). My issue is still that there is documentation as to what this actually offers.
Is this simply a helm chart to run an agent on a single pod? Does it scale in any way? Basically - is it a simple agent (similiar to on-premise agents, running in the background, but here on K8s), or is it a more advanced one that offers scaling features? What is it intended for, and how does it work?
The official documentation are very spa...
Maybe @<1523701827080556544:profile|JuicyFox94> can answer some questions thenβ¦
For example, whatβs the difference between agentk8sglue.nodeSelector
and agentk8sglue.basePodTemplate.nodeSelector
?
Am I correct in understanding that the former decides the node type that runs the βscalerβ (listening to the given agentk8sglue.queue
), and the latter for any new booted instance/pod, that will actually run the agent and the task?
Read: The former can be kept lightweight, as it does no...
Perfect, thanks for the answers Valeriano. These small stuff are missing from the documentation, but I now feel much more confident in setting this up.
AgitatedDove14 yeah I see this now; this was an issue because I later had to "disconnect" the remote task, so it can, itself, create new tasks (using clearml.config.remote.override_current_task_id(None)
). I guess you might remember that discussion? π
EDIT: It's the discussion we had here, for reference. https://clearml.slack.com/archives/CTK20V944/p1640955599257500?thread_ts=1640867211.238900&cid=CTK20V944
So probably not needed in JitteryCoyote63 's case, we still have some...
An internal project I've accidentally made with a hidden tag while playing around with the ClearML internal code.
I created a new task with the project name internal tests
, and no task name (so it's derived by ClearML).
The task was a simple print out.
The project does not appear in the project space and does not turn up on searches (the task does)
JitteryCoyote63 yes exactly, sorry, I forgot to add the Task.get_task
in my response. That's exactly what we do π
We just do task.close() and then start a new task.Init() manually, so our "pipelines" are self-controlled
Yes exactly π Good news.