Reputation
Badges 1
662 × Eureka!That could be a solution for the regex search; my comment on the pop-up (in the previous reply) was a bit more generic - just that it should potentially include some information on what failed while fetching experiments ๐
Is there a preferred way to stop the agent?
That will come at a later stage
I've also followed https://clearml.slack.com/archives/CTK20V944/p1628333126247800 but it did not help
Anyway sounds good! ๐
I will TIAS, but maybe worthwhile to also mention if it has to be the absolute path or if relative path is fine too!
That still seems to crash SuccessfulKoala55 ๐ค
EDIT: No, wait, the environment still needs updating. One moment still...
Honestly I wouldn't mind building the image myself, but the glue-k8s setup is missing some documentation so I'm not sure how to proceed
I think so, it was just missing from the official documentation ๐ Thanks!
Any updates @<1523701087100473344:profile|SuccessfulKoala55> ? ๐ซฃ
AFAICS it's quite trivial implementation at the moment, and would otherwise require parsing the text file to find some references, right?
https://github.com/allegroai/clearml/blob/18c7dc70cefdd4ad739be3799bb3d284883f28b2/clearml/task.py#L1592
Why not give ClearML read-only access credentials to the repository?
Okay so the only missing thing of the puzzle I think is that it would be nice if this propagates to the autoscaler as well; that then also allows hiding some of the credentials etc ๐ฎ
Right so this is checksum based? Are there plans to only store delta changes for files (i.e. store the changed byte instead of the entire file)?
Just because it's handy to compare differences and see how the data changed between iterations, but I guess we'll work with that ๐
We'll probably do something like:
When creating a new dataset with a parent (or parents), look at immediate parents for identically-named files If those exist, load those with matching framework (pyarrow, pandas, etc), and log differences to the new dataset ๐
I also tried switching to dockerized mode now, getting the same issue ๐ค
It failed on some missing files in my remote_execution, but otherwise seems fine now
Answering myself for future interested users (at least GrumpySeaurchin29 I think you were interested):
You can "hide" (explained below) secrets directly in the agent ๐ :
When you start the agent listening to a specific queue (i.e. the services worker), you can specify additional environment variables by prefixing them to the execution, i.e. FOO='bar' clearml-agent daemon ....
Modify the example AWS autoscaler script - after the driver = AWSDriver.from_config(conf)
, inject ...
I opened a GH issue shortly after posting here. @<1523701312477204480:profile|FrothyDog40> replied (hoping I tagged the right person).
We need to close the task. This is part of our unittests for a framework built on top of ClearML, so every test creates and closes a task.
Yes exactly that AgitatedDove14
Testing our logic maps correctly, etc for everything related to ClearML
- in the second scenario, I might have not changed the results of the step, but my refactoring changed the speed considerably and this is something I measure.
- in the third scenario, I might have not changed the results of the step and my refactoring just cleaned the code, but besides that, nothing substantially was changed. Thus I do not want a rerun.Well, I would say then that in the second scenario itโs just rerunning the pipeline, and in the third itโs not running it at all ๐
(I ...
I wouldn't mind going the requests
route if I could find the API end point from the SDK?
Sorry AgitatedDove14 , forgot to get back to this.
I've been trying to convince my team to drop poetry ๐
It does not ๐
We started discussing it here - https://clearml.slack.com/archives/CTK20V944/p1640955599257500?thread_ts=1640867211.238900&cid=CTK20V944
You suggested this solution - https://clearml.slack.com/archives/CTK20V944/p1640973263261400?thread_ts=1640867211.238900&cid=CTK20V944
And I eventually found this solution to work - https://clearml.slack.com/archives/CTK20V944/p1641034236266500?thread_ts=1640867211.238900&cid=CTK20V944
Ultimately we're trying to avoid docker in AWS autoscaler (virtualization on top of virtualization seems redundant), and instead we maintain an AMI for a faster boot sequence.
We had no issues when we used pip
, but now when trying to work with poetry
all these issues came up.
The way I understand poetry
to work, is that it is expected there is one system-wide installation that is used for virtual environment creation and manipulation. So at least it may be desired that the ...
SuccessfulKoala55 help me out here ๐
It seems all the changes I make in the AWS autoscaler apply directly to the virtual environment set for the autoscaler, but nothing from that propagates down to the launched instances.
So e.g. the autoscaler environment has poetry
installed, but then the instance fails because it does not have it available?
Or to be clear, the environment installed by the autoscaler under /clearml_agent_venv
has poetry installed, and it uses that to set up the environment for the executed task, e.g. in root/.clearml/venvs-builds/3.10/task_repository/.../.venv
, but the latter does not have poetry installed, and so it crashes?
Sure SuccessfulKoala55 , and thanks for looking into it.
As an alternative (for now, or in general), we could consider reverting back to pip. The issue we encounter is that we have a monorepo, so frozen requirements should specify relative paths, but pip freeze
does not seem to do that, so ClearML also fails in pip
mode
packages an entire folder as zip
What if I have multiple files that are not in the same folder? (That is the current use-case)
It otherwise makes sense I think ๐
Our workaround now for using a Dataset
as we do, is to store the dataset ID as a configuration parameter, so it's always included too ๐