Reputation
Badges 1
25 × Eureka!it will only if oom killer is enabled
true, but you will still get OOM (I believe). I think the main issue is the even from inside the container, when you query the memory, you see the entire machine's memory... I'm not sure what we can do about that
Hmm, yes this fits the message. Which basically says that it gave up on analyzing the code because it run out of time. Is the execution very short? Or the repo very large?
ShortElephant92 yep, this is definitely enterprise feature 🙂
But you can configure user/pass on the open source, even store as hasedh the passwords if you need.
I just think that the create function should expect
dataset_name
to be None in the case of
use_current_task=True
(or allow the dataset name to differ from the task name)
I think you are correct, at least we should output a warning that it is ignored ... I'll make sure we do 🙂
Basically if I pass an arg with a default value of False, which is a bool, it'll run fine originally, since it just accepted the default value.
I think this is the nargs="?"
, is that right ?
FriendlySquid61 could you help?
They don't give an in app notification.
Oh I see, I assume this is because the github account is not connected with any email, so no invite is sent.
Basically they should just be able to re-login and then they could switch to your workspace (with the link you generated)
The docker-compose full logs?
IdealPanda97 Hmm I see...
Well, unfortunately, Trains is all about free access to all 🙂
That said, the Enterprise edition does add permissions and data management on top of Trains. You can get in touch through the https://allegro.ai/enterprise/#contact , I'm sure someone will get back to you soon.
PompousBeetle71 kudos on the solution!
What were the loggers you ended up setting?
I'd like to make sure we fix this issue
Hi ScantChimpanzee51
In order to get it to work:conf_file = "options.yml" conf_file = task.connect_configuration(conf_file, "Yaml options") with open(conf_file, "r") as f: ...
The reason is it will not overwrite the local file but return a temp file for you to read.
And come to think about it, maybe we should add an argument saying, it should allow it to overwrite the local file, wdyt?
SlipperyDove40 Yes there isTRAINS_CONFIG_FILE
https://allegro.ai/docs/faq/faq/#trains-configuration
I will TIAS, but maybe worthwhile to also mention if it has to be the absolute path or if relative path is fine too!
Good point! (absolute but you can use ~, and I "think" also $ENV )
Hi @<1523701168822292480:profile|ExuberantBat52>
I am trying to execute a pipeline remotely,
How are you creating your pipeline? and are you referring to an issue with the pipeline logic or is it a component that needs that repo installed ?
python k8s_glue_example.py --help
To get all the commands for configurations
You should probably pass a few :)
Wait, is "SSH_AUTH_SOCK" defined on the host? it should auto mount the SSH folder as well?!
I am logging debug images via Tensorboard (via
add_image
function), however apparently these debug images are not collected within fileserver,
ZanyPig66 what do you mean not collected to the file server? are you saying the TB add_image is not automatically uploading images? or that you cannot access the files on your files server?
I don't think so. it is solved by installing openssh-client to the docker image or by adding deploy token to the cloning url in web ui
You can also have the token (token==password) configured as the defauylt user/pass in your agent's clearml.conf
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L19
Hi @<1657918706052763648:profile|SillyRobin38>
Hi everyone, I wanted to inquire if it's possible to have some type of model unloading.
What do you mean by "unloading" ? you mean remove it from the clearml-serving endpoint ?
If this is from the clearml-serving, then yes you can online :
None
Is there any way to debug these sessions through clearml? Thanks!
Yes this is a real problem, AWS does not allow to get the data very easily...
Can you check the AWS console, see what you have there ?
In theory this should have worked.
Maybe we you are missing some escaping for the "extra_vm_bash_script" ?
I'm hoping the console output will tell us
Exactly!
Regarding adding feature store, probably not in the near future, a scalable feature store is quite the project, probably more realistic to somehow have a recipe to deploy with Feast
SubstantialElk6 (2) yes definitely will be fixed
Regrading (1), what do you mean by "via the code" ? Do you mean like as a Task docker cmd ?
(I'll make sure it is added to the docstring because apparently it was not there
Yes
Are you trying to upload_artifact to a Task that is already completed ?
Hi BroadSeaturtle49torchvision!=0.13.0,>=0.8.1
is this what you have in the requirements ?
The clearml-agent is parsing the requested version and tries to match it to the version found/supported by the installed cuda
There is the possibility the combinarion wither does not exist or fore some reason the parsing (i.e. clearml-agent's parsing) fails
can you maybe provide the Task's full log?
Hi PerplexedCow66
I would like to know how to serve a model, even if I do not use any serving engine
What do you mean no serving engine, i.e. custom code?
Besides that, how can I manage authorization between multiple endpoints?
Are you referring to limiting access to all the endpoints?
How can I manage API keys to control who can access my endpoints?
Just to be clear, accessing the endpoints has nothing to do with the clearml-server credentials, so are you asking how to...
Hi SolidSealion72
"/tmp" contained alot of artifacts from ClearML past runs (1.6T in our case).
How did you end up with 1.6TB of artifacts there? what are the workflows on that machine? at least in theory, there should not be any leftover in the tmp folder, after the process is completed.
VexedCat68
But what's happening is, that I only publish a dataset once but every time it polls,
this seems wrong (i.e a bug?!), how do you setup the trigger ? is the Trigger Task constantly running or are you re-launching it?