Reputation
Badges 1
662 × Eureka!Answering myself for future interested users (at least GrumpySeaurchin29 I think you were interested):
You can "hide" (explained below) secrets directly in the agent 😁 :
When you start the agent listening to a specific queue (i.e. the services worker), you can specify additional environment variables by prefixing them to the execution, i.e. FOO='bar' clearml-agent daemon ....
Modify the example AWS autoscaler script - after the driver = AWSDriver.from_config(conf)
, inject ...
CostlyOstrich36 I'm not sure what you mean by "through the apps", but any script AFAICS would expose the values of these environment variables; or what am I missing?
True, and we plan to migrate to pipelines once we have some time for it :) but anyway that condition is flawed I believe
So now we need to pass Task.init(deferred_init=0)
because the default Task.init(deferred_init=False)
is wrong
That's a nice work around of course - I'm sure it works and I'll give it a shot momentarily. I'm just wondering if ClearML could automatically recognize image files in upload_artifact
(and other well known suffixes) and do that for me.
Actually TimelyPenguin76 I get only the following as a "preview" -- I thought the preview for an image would be... the image itself..?
Thanks David! I appreciate that, it would be very nice to have a consistent pattern in this!
Note that it would succeed if e.g. run with pytest -s
SmugDolphin23 I think you can simply change not (type(deferred_init) == int and deferred_init == 0)
to deferred_init is True
?
I'll see if we can do that still (as the queue name suggests, this was a POC, so I'm trying to fix things before they give up 😛 ).
Any other thoughts? The original thread https://clearml.slack.com/archives/CTK20V944/p1641490355015400 suggests this PR solved the issue
Something like this, SuccessfulKoala55 ?
Open a bash session on the docker ( docker exec -it <docker id> /bin/bash
) Open a mongo shell ( mongo
) Switch to backend db ( use backend
) Get relevant project IDs ( db.project.find({"name": "ClearML Examples"})
and db.project.find({"name": "ClearML - Nvidia Framework Examples/Clara"})
) Remove relevant tasks ( db.task.remove({"project": "<project_id>"})
) Remove project IDs ( db.project.remove({"name": ...})
)
Holy crap this was a light-bulb moment, is this listed somewhere in the docs?
It solves so much of my issues xD
and I don't think it's in the docs - we'll add that
Very welcome update, please use some highlighting for it too, it's so important for a complete understanding of how the remote execution works
Exactly; the cloud instances (that are run with clearml-agent
) should have that clearml.conf
+ any changes specified in extra_clearml_configuration
for the scaler
I guess it does not do so for all settings, but only those that come from Session()
Right, but that's as defined in the services agent, which is not immediately transparent
Let me know if you do; would be nice to have control over that 😁
The idea is that the features would be copied/accessed by the server, so we can transition slowly and not use the available storage manager for data monitoring
Or some users that update their poetry.lock
and some that update manually as they prefer to resolve on their own.
Well you can install the binary in the additional start up commands.
Matter of fact, you can just include the ECR login in the "startup steps" offered by the scaler, so no need for this repository. I was thinking these are local instances.
Kinda, yes, and this has changed with 1.8.1.
The thing is that afaik currently ClearML does not officially support a remotely executed task to spawn more tasks, so we also have a small hack that marks the remote "master process" as a local task prior to anything else.
Coming back to this; ClearML prints a lot of error messages in local tests, supposedly because the output streams are not directly available:
` --- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.10/logging/init.py", line 1103, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/home/idan/CC/git/ds-platform/.venv/lib/python3.10/site-packages/clearml/task.py", line 3504, in _at_exit
self.__shutdown...
i.e.ERROR Fetching experiments failed. Reason: Backend timeout (600s)
ERROR Fetching experiments failed. Reason: Invalid project ID
Hey FrothyDog40 ! Thanks for clarifying - guess we'll have to wait for that as a feature 😁
Should I create a new issue or just add to this one? https://github.com/allegroai/clearml/issues/529
I will TIAS, but maybe worthwhile to also mention if it has to be the absolute path or if relative path is fine too!
Sure CostlyOstrich36 , sorry it took me so long to reply. I minimized the window a bit here so everything will fill in nicely. Worth mentioning this happens on all pages of course, but I went to the profile page so you can also see the clearml server version.
We have a more complicated case but I'll work around it 😄
Follow up though - can configuration objects refer to one-another internally in ClearML?
Oh and clearml-agent==1.1.2