AntsyElk37

2 Questions, 38 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

30 × Eureka!

Questions 2
Answers 38

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hello! I Had Trouble Running Clearml-Agent On K8S. I Fixed It By Modifying The Helm Chart To Allow Specifying Runtimeclassname (Which Is Needed When Using Nvidia Gpu Operator). I Did This,

hello! i had trouble running clearml-agent on k8s. i fixed it by modifying the helm chart to allow specifying runtimeClassName (which is needed when using nv...

kubernetes mlops

8 months ago

0 Votes

31 Answers

140K Views

0 Votes 31 Answers 140K Views

Hello! I'M Running Clearml-Server On Kubernetes, And It Seems My Models Are Not Really Saved. I See That Doing Task.Init(Output_Uri=True) Should Send Models To Fileserver. The Models Are Visible In The Ui But The Download Button Is Greyed Out And When I D

Hello! i'm running clearml-server on kubernetes, and it seems my models are not really saved. i see that doing task.init(output_uri=True) should send models ...

clearml

3 years ago

0 Hello! I Had Trouble Running Clearml-Agent On K8S. I Fixed It By Modifying The Helm Chart To Allow Specifying Runtimeclassname (Which Is Needed When Using Nvidia Gpu Operator). I Did This,

hi @<1729671499981262848:profile|CooperativeKitten94> did i convince you with my argument ? do you think having runtimeClass configurable is worth it ?

7 months ago

0 Hello! I'M Running Clearml-Server On Kubernetes, And It Seems My Models Are Not Really Saved. I See That Doing Task.Init(Output_Uri=True) Should Send Models To Fileserver. The Models Are Visible In The Ui But The Download Button Is Greyed Out And When I D

well it doesn't fail. but whatever i set gets ignored

3 years ago

0 Hello! I Had Trouble Running Clearml-Agent On K8S. I Fixed It By Modifying The Helm Chart To Allow Specifying Runtimeclassname (Which Is Needed When Using Nvidia Gpu Operator). I Did This,

this seems to be confirmed by this documentation None If you have not changed the default runtime on your GPU nodes, you must explicitly request the NVIDIA runtime by setting runtimeClassName: nvidia in the Pod spec:

8 months ago

this is my cmdline: clearml-task --name hla --requirements requirements.txt --project examples --output-uri http://clearml-fileserver:8081 --queue aws-instances --script keras_tensorboard.py

3 years ago

AgitatedDove14 your trick seems to work (i had to change the url to reflect the fact i run on k8s)

3 years ago

but i still think the same should be possible using the Task.init

3 years ago

no i don't think so, i think rather Task.init is only used for running outside of agent

3 years ago

and when i try to use --output-uri i can't pass true because obviously i can't pass a boolean only strings

3 years ago

and also, on the tutorials that do something with task.init, the example always talks about running locally and not in the agent

3 years ago

i can pass any crazy value i want.. it doesn't matter. however, i can use --output_uri= s3://blabla and then at least i get the error that it cannot use that bucket

3 years ago

well it made a difference (the code for the init() is not added anymore) but it still didn't take my output uri

3 years ago

hello, i'm still not able to save clearml models. They are generated and registered okay, but they are not on the fileserver. i now have Task.init(output_uri=True) and i also have --skip-task-init in clearml commandline so that it doesn't overwrite the task.init call

3 years ago

this is the script shown by clearML ui. so the task.init call looks right

3 years ago

the model has this information ... the /tmp seem local URIs suggesting that it doesn't even try to upload them

3 years ago

it seems that whatever i pass to Task.init is ignored

3 years ago

this is the output of the training. it doens't try to upload (note that this is my second try so it already found a model with that name, but on my first try it didn't work either)

3 years ago

i set reuse_last_task_id to false to force creation of a new task in all cases

3 years ago

don't know.. but i see for instance when using clearml-task i can put any (even nonsensical) values in Task.init

3 years ago

task = Task.init(project_name='examples', task_name='moemwap', output_uri=True, reuse_last_task_id=False)

3 years ago

i added --skip-task-init too

3 years ago

and ... clearml-agent takes a --project and a --name argument that are mandatory , so these are never taken from Task.init

3 years ago

it didn''t make a difference

3 years ago

so it seems that it takes output_uri from the clearml commandline but not from the Task.init inside the scripot

3 years ago

this is now in my python script:

3 years ago

its as if the line is not there

3 years ago

0 Hello! I Had Trouble Running Clearml-Agent On K8S. I Fixed It By Modifying The Helm Chart To Allow Specifying Runtimeclassname (Which Is Needed When Using Nvidia Gpu Operator). I Did This,

i'm still trying to understand why it was needed in our case. i have a nvidia gpu operator with mostly the default values installed on our on prem cluster. i found there is an option CONTAINERD_SET_AS_DEFAULT in the operator, which, when enabled, puts the runtime for all pods. we didn't enable that option, maybe if we had enabled it would have worked.

8 months ago

for comparison: this is when i use --output-uri

3 years ago

it was to test if reuse_last_task_id made any effect (i have the impression it doesn't)

3 years ago

(same for environment variable)

3 years ago

Show more results