Reputation
Badges 1
25 × Eureka!maybe we should add some ENV setting it? (I'm not sure we should disable SSL for all S3 connections... so somehow specify the mino it should use http with)
and I've made a script to edit it to our needs as part of the installation processΒ
Β Thanks Martin!
My pleasure, btw: there is no actual need to configure all the clearml.conf values. It will actually take the defaults from the clearml package itself. This means you only need something like:
` api {
server config here
}
sdk.aws.s3{
minio config here
} `
named asΒ
venv_update
Β (I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?
This is deprecated... it was a test to use the a package that can update pip venvs, but it was never stable, we will remove it in the next version
Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable anΒ
output_uri
Β parameter in theΒ
PipelineDecorator.componen...
Guys, any chance you can verify the RC solves the issue?pip install clearml==1.0.2rc0
My clearml-server server crashed for some reason
π No worries
ValueError('Task object can only be updated if created or in_progress')
It seems the task
is not "running" hence the error, could that be
I will TIAS, but maybe worthwhile to also mention if it has to be the absolute path or if relative path is fine too!
Good point! (absolute but you can use ~, and I "think" also $ENV )
${PWD} works!
This will be resolved every call to Task.init (so I would recommend against it), how about "$HOME/" ?
UnevenDolphin73 it seems this is a UI browser limit, this means we will need to move it into the server ...
See here: https://clearml.slack.com/archives/CTK20V944/p1640247879153700?thread_ts=1640135359.125200&cid=CTK20V944
hmm that is odd, let me check
There seems to be a problem with multiprocessing: Although I stopped the task,
You mean you "aborted the task" from the UI?
- There is a memory leak somewhere, please see the screenshot of datadog memory consumptionI'm assuming from the leftover processes ?
Python 3.8/Pytorch 1.11/clearml-sdk 1.9.0/clearml-agent 1.4.1
From the log I see the agent is running in venv mode
Hmm please try with the latest clearml-agent (the others should not have any effect)
What exactly do you mean by docker run permissions?
Hi @<1684010629741940736:profile|NonsensicalSparrow35>
however for the remote file it always creates the name with the following pattern:
{filename_prefix}checkpoint{n}.pt
..
Is this the main issue?
Notice that the model name (i.e. the entry on the Task itself) is not directly connected with the stored file name on the target file server (or S3)
You mean parameters of the pipeline? Is this a pipeline from Tasks or from function decorator?
It is recommended to create a git TOKEN with read only permissions and use it (more secure) π
Could you run your code not from the git repository.
I have a theory, you never actually added the entry point file to the git repo, so the agent never actually installed it, and it just did nothing (it should have reported an error, I'll look into it)
WDYT?
Great!
I'll make sure the agent outputs the proper error π
So I'm gusseting the cli will be in the folder of python:import sys from pathlib2 import Path (Path(sys.executable).parent / 'cli-util-here').as_posix()
Hi JitteryCoyote63
Just making sure, the package itself it installed as part of the "Installed packages", and it also installs a command line utility ?
To store all the debug samples, also it can store all the models (if you configure the output_uri=' http://file_server_here:8081 ') Yes: instead of the file server have 's3://<ip_of_minio>:9000/bucket' make sure you add the credentials for the minio in the trains.conf Yes, basically once you have the creendtials in the trains.conf, you could do StorageManager.get_local_copy('s3://<minio>:9000/bucket/file') (also upload of course π )
This is the prerequisites of the docker service installed on the host machine (where the agent is running)
Basically follow: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
https://docs.docker.com/compose/gpu-support/
SubstantialElk6 on the client side?
Hi RipeGoose2
Yes the slide feature is definelty on the do do list (a lot of users asked for it).
Unfortunately other than actually PR-ing to the UI repo, there is no easy way to add customization (If you have an idea on how we could have an easy interface, that would be great.)
I'll check what's the status with the slider, maybe we will be lucky enough to see it in he next update π
So was the issue solved?
seems like pip 20.1.1 has the issue, but >= 22.2.2 do not.
Notice we changed the value there, it now has two versions, pne for python 3.10 < and one for python 3.10>=
The main reason is that pip changed their resolving algorithm, and the new one can break its own dependencies (i.e. pip freeze > requirements.txt -> pip install might not actually work)
None
JitteryCoyote63 so now everything works as expected ?
I'm kind of at a point where I don't know a lot of what to even search for.
we feel you π , yes there still isn't a very good source of information on where to get started...
This is because the entire field is constantly changing and evolving, and one solution will usually only apply to specific use case...
I would start with the mlops community slack channel, and youtube talks (specifically those by companies describe how they built their own internal infrastructure, i...