Reputation
Badges 1
662 × Eureka!Hm, I did not specify any specific versions previously. What was the previous default?
AgitatedDove14
hmmm... they are important, but only when starting the process. any specific suggestion ?
(and they are deleted after the Task is done, so they are temp)
Ah, then no, sounds temporary. If they're only relevant when starting the process though, I would suggest deleting them immediately when they're no longer needed, and not wait for the end of the task (if possible, of course)
Maybe they shouldn't be placed under /tmp if they're mission critical, but rather the clearml cache folder? π€
- in the second scenario, I might have not changed the results of the step, but my refactoring changed the speed considerably and this is something I measure.
- in the third scenario, I might have not changed the results of the step and my refactoring just cleaned the code, but besides that, nothing substantially was changed. Thus I do not want a rerun.Well, I would say then that in the second scenario itβs just rerunning the pipeline, and in the third itβs not running it at all π
(I ...
Is it currently broken? π€
I think now there's the following:
Resource type Queue (name) defines resource + max instancesAnd I'm looking for:
Resource type "pool" of resources (type + max instances) A pool can be shared among queues
Heh, my bad, the term "user" is very much ingrained in our internal way of working. You can think of it as basically any technically-inclined person in your team or company.
Indeed the options in the WebUI are too limited for our use case, so we're developed "apps" that take a yaml configuration file and build a matching pipeline.
With that, our users do not need to code directly, and we can offer much more fine control over the pipeline.
As for the imports, what I meant is that I encounter...
Of course now it's not there anymore π If/when it happens again I'll ping you here π
Maybe it's better to approach this the other way, if one uses Task.force_requirements_env_freeze() , then the locally updated packages aren't reflected in poetry π€
That's fine for the current use-case I believe.
Once the team is happy with the logging functionality, we'll move on to remote execution and things will update.
Another example - trying to validate dataset interactions ends with
` else:
self._created_task = True
dataset_project, parent_project = self._build_hidden_project_name(dataset_project, dataset_name)
task = Task.create(
project_name=dataset_project, task_name=dataset_name, task_type=Task.TaskTypes.data_processing)
if bool(Session.check_min_api_server_version(Dataset.__min_api_version)):
get_or_create_proje...
I'm guessing that's not on pypi yet?
Hurrah! Addedgit config --system credential.helper 'store --file /root/.git-credentials' to the extra_vm_bash_script and now it works
(logs the given git credentials in the store file, which can then be used immediately for the recursive calls)
Nothing I can spot --
ClearML results page:
ClearML pipeline page:
Launching the next 2 steps
Launching step [...]
Launching step [...]
Launching step: ...
Parameters:
{...}
Configurations:
{}
Overrides:
{}
Launching step: ...
Parameters:
{...}
Configurations:
{}
Overrides:
{}
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
2023-02-21 13:53:48
ClearML Monitor: Could not detect iteration reporting, falling back to itera...
Those are for specific packages, I'm wondering about the package managers as a whole
Now I tried setting pip version to <22.3 (both in the config and in the scaler's "extra config parameters"), but still it uses the latest?
added seed packages: pip==22.3.1, setuptools==65.5.1, wheel==0.38.4
I think ClearML boots up only afterwards, so those environment variables may not be available yet.
You should set them manually in the bootstrap code unfortuantely.
Thanks AgitatedDove14 , I'll give it a try. Perhaps additional documentation is needed for that extra_layout
Uhhh, not really unfortunately :white_frowning_face: . I have ~20 tasks happening in a single file, and it's quite random if/when this happens. I just noticed this tends to happen with the shorter tasks
I can also do this via Mongo directly, but I was hoping to skip the K8S interaction there.
Yup, latest version of ClearML SDK, and we're deployed on AWS using K8s helm
i.e.ERROR Fetching experiments failed. Reason: Backend timeout (600s)ERROR Fetching experiments failed. Reason: Invalid project ID
Ultimately we're trying to avoid docker in AWS autoscaler (virtualization on top of virtualization seems redundant), and instead we maintain an AMI for a faster boot sequence.
We had no issues when we used pip , but now when trying to work with poetry all these issues came up.
The way I understand poetry to work, is that it is expected there is one system-wide installation that is used for virtual environment creation and manipulation. So at least it may be desired that the ...
Okay, I'll test it out by trying to downgrade to 4.0.0 and then upgrade to 4.1.2
Just to make sure, the chart_ref is allegroai/clearml right? (for some reason we had clearml/clearml and it seems like it previously worked?)
I just used this to create the dual_gpu queue:clearml-agent daemon --queue dual_gpu --create-queue --gpus 0,1 --detached
Last but not least - can I cancel the offline zip creation if I'm not interested in it π€
EDIT: I see not, guess one has to patch ZipFile ...
Consider e.g:
# steps.py
class DataFetchingStep:
def __init__(self, source, query, locations, timestamps):
# ...
def run(self, queue=None, **kwargs):
# ...
class DataTransformationStep:
def __init__(self, inputs, transformations):
# inputs can include instances of DataFetchingStep, or local files, for example
# ...
def run(self, queue=None, **kwargs):
# ...
And then the following SDK usage in a notebook:
from steps imp...
Since the additional credentials are available to the autoscaler when it boots up (via the config file), I thought it could use those natively?