SarcasticSquirrel56

16 Questions, 144 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

137 × Eureka!

Questions 16
Answers 144

0 Votes

14 Answers

941 Views

0 Votes 14 Answers 941 Views

Hi Folks, I Have Installed Clearml On Kubernets Using The Helm Chart, But I Had To Specify Three Different Domains For The Ui, Apiserver And Fileserver. Is There Any Way To Let Clearml Know That The Apiserver Is At

Hi folks, I have installed ClearML on kubernets using the helm chart, but I had to specify three different domains for the ui, apiserver and fileserver. Is t...

clearml

2 years ago

0 Votes

17 Answers

1K Views

0 Votes 17 Answers 1K Views

Hi Folks I Have A Problem I Can'T Understand. Plots Are Not Shown When Experiments Are Executed From The Ui. For Example, If I Run The Code On My Laptop, And I Go To The Experiment Page I Can See Correctly The Plots: But If I Then Clone The Task, And Ex

Hi folks I have a problem I can't understand. Plots are not shown when experiments are executed from the UI. For example, if I run the code on my laptop, and...

clearml

2 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi Folks, A Question Regarding The Clearml-Agent With K8S Glue. In The Agents We Mount An Nfs Volume So That Some Artifacts And Data Would Be Available For Training. I Have Seen That The K8S Glue Runs As Root (I Guess To Be Able To Spawn New Pods?), But

Hi folks, a question regarding the clearml-agent with k8s glue. In the agents we mount an nfs volume so that some artifacts and data would be available for t...

clearml

2 years ago

0 Votes

15 Answers

1K Views

0 Votes 15 Answers 1K Views

Hi Everybody, I Am Having An Issues With A Self-Hosted Clearml Server... I Am Having A Problem Enqueuing Experiments Whose Code Is In A Git Repository, They Are In A Pending State And Proceed... However If I Copy The Same Code Out In A Folder With No Rep

Hi everybody, I am having an issues with a self-hosted clearml server... I am having a problem enqueuing experiments whose code is in a git repository, they ...

clearml

2 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi Folks, Any Of You Has Experience In Deploying Clearml To Kubernetes Using Argocd? I Managed To Make It Run Pointing It To The Clearml-Charts-Repo, It Recognizes The Helm Chart And It Works. But I Am Struggling A Bit To Write My Own Definition To Make I

Hi folks, any of you has experience in deploying ClearML to Kubernetes using ArgoCD? I managed to make it run pointing it to the clearml-charts-repo, it reco...

clearml

2 years ago

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

Hi Folks, I Am Having An Issue I Can'T Properly Understand: I Have Tried To Run The "Dataset" Example From The Official Clearml Repository (From My Laptop) For Some Reason It Got Stuck, So I Killed The Process, But In Clearml Ui It Still Results As "Runn

Hi folks, I am having an issue I can't properly understand: I have tried to run the "dataset" example from the official clearml repository (from my laptop) F...

clearml

2 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

I Do Have One Questions About Using The Helm Chart, Is There Any Way To Specify The Users In The Values.Yaml?

I do have one questions about using the helm chart, is there any way to specify the users in the values.yaml?

clearml

2 years ago

0 Votes

11 Answers

1K Views

0 Votes 11 Answers 1K Views

Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

Good morning folks, I am setting up ClearML on a (self-hosted) K8s cluster using the https://github.com/allegroai/clearml-helm-charts/blob/main/charts/clearm...

mlops

2 years ago

0 Votes

19 Answers

1K Views

0 Votes 19 Answers 1K Views

Hi Folks, One Question: I Have A Script That Looks Like:

Hi folks, one question: I have a script that looks like: import clearml as cml import numpy as np from sklearn.linear_model import LogisticRegression from sk...

clearml

2 years ago

0 Votes

6 Answers

998 Views

0 Votes 6 Answers 998 Views

Hi Folks, Good Morning

Hi folks, good morning 🙂 In our setup we have a set of queues that do not use any GPU resources. Yet, when I run an experiment in such queues, we see a Warn...

clearml

2 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi Folks, I Have A Question On Something That It'S Not Clear To Me Reading The Documentation At

Hi folks, I have a question on something that it's not clear to me reading the documentation at https://clear.ml/docs/latest/docs/clearml_agent/ From what I ...

clearml

2 years ago

0 Votes

31 Answers

22K Views

0 Votes 31 Answers 22K Views

Hi Folks, I Am Trying To Run Clearml On A Local Kubernetes (Created With Rancher Desktop), But I Am Running Into Issues. I Admit I Am A Complete Noob To K8S, Helm, Clearml Etc. Etc. So I Am Not Sure If I Did Some Very Stupid Mistakes. I Have A K8S Cluste

Hi folks, I am trying to run ClearML on a local kubernetes (created with Rancher Desktop), but I am running into issues. I admit I am a complete noob to K8s,...

kubernetes

2 years ago

0 Votes

13 Answers

1K Views

0 Votes 13 Answers 1K Views

Hi Folks, I Have A Question Related To The Storage Of Artifacts, As It Is Not Entirely Clear To Me Where To Configure It. If I Read The Documentation

Hi folks, I have a question related to the storage of artifacts, as it is not entirely clear to me where to configure it. If I read the documentation https:/...

clearml

2 years ago

0 Votes

31 Answers

22K Views

0 Votes 31 Answers 22K Views

Hi Folks, Occasionally When I Clone A Job And Enqueue It, Instead Of Being Processed By The Expected Queue, A New Queue (With Some Id That Looks Like An Hash) Is Created Instead, And The Experiment Hangs In A "Pending" State. When This Happens, If I Abor

Hi folks, occasionally when I clone a job and enqueue it, instead of being processed by the expected queue, a new queue (with some id that looks like an hash...

clearml

2 years ago

0 Votes

31 Answers

23K Views

0 Votes 31 Answers 23K Views

Hi Folks, I Did A Deployment Of Clearml Using The K8S Helm Chart, And I Set The Agent Using K8S Glue. I Run A Task Locally, And I Went To The Ui Cloned The Experiment And Scheduled It In The Default Queue. After Doing This, I See That The Experiment Is Q

Hi folks, I did a deployment of ClearML using the K8s helm chart, and I set the agent using K8s Glue. I run a task locally, and I went to the UI cloned the e...

mlops

2 years ago

0 Votes

31 Answers

20K Views

0 Votes 31 Answers 20K Views

Hi Folks, I Just Deployed A Clearml Agent Using The Helm Chart. I Have A Few Doubts:

Hi folks, I just deployed a ClearML agent using the Helm chart. I have a few doubts: after the deployment, I see a new queue called k8s_scheduler, which I di...

mlops

2 years ago

0 Hi Folks, Occasionally When I Clone A Job And Enqueue It, Instead Of Being Processed By The Expected Queue, A New Queue (With Some Id That Looks Like An Hash) Is Created Instead, And The Experiment Hangs In A "Pending" State. When This Happens, If I Abor

If now I abort the experiment (which is in a pending state and not running), and re-enqueue it again -- no parameters modifications this time...
and I re-enqueue it to the CPU queue, I see that it is sent to the right queue, and after a few seconds the job enters a running state and it completes correctly

2 years ago

0 Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

Right now I see the default agent that comes with the helm chart...

2 years ago

0 Hi Folks, I Have A Question On Something That It'S Not Clear To Me Reading The Documentation At

Hi Alon, thanks, I actually watched those videos. But they don't help with settings things up 🙂

From your explanation, I understand that Agents are indeed needed for ClearML to work.

2 years ago

the queues already exist, I created them through the UI.

2 years ago

the experiment is supposed tu run in this queue, but then it hangs in a pending scheduler

2 years ago

0 Hi Everybody, I Am Having An Issues With A Self-Hosted Clearml Server... I Am Having A Problem Enqueuing Experiments Whose Code Is In A Git Repository, They Are In A Pending State And Proceed... However If I Copy The Same Code Out In A Folder With No Rep

Hi Jake thanks for your answer!

So I just have a very simple file "project.py" with this content:

` from clearml import Task

task = Task.init(project_name='project-no-git', task_name='experiment-1')

import pandas as pd

print("OK") If I run python project.py ` from a folder that is not in a git repository, I can clone the task and enqueue it from the UI, and ti runs in the agent with no problems.
If I copy the same file, in a folder that is in a git repository, when I enqueue the ex...

2 years ago

I have tried this several time and the behaviour is always the same. It looks like when I modify some hyperparameter, when I enqueue the experiment to one queue, things don't work if I didn't make sure to have previously set the value of k8s-queue to the name of the queue that I want to use. If I don't modify the configuration (e.g. I abort, or reset the job and enqueue it again, or clone and enqueue it without modifying the hyperparameters) then everything works as expected.

2 years ago

Thanks Jake!

2 years ago

And yes these appear in the dropdown menu when I want to enqueue an experiment

2 years ago

also, if I clone an experiment on wich I had to set the k8s-queue user property manually to run experiments on a queue, say cpu, and enqueue it to a different queue, say gpu, the property is not updated, and the experiment is enqueued in a queue with a random hash like name. I either have to delete the attribute, or set it to the right queue name, before enqueuing it, to have it run in the right queue

2 years ago

many thanks :)

2 years ago

0 Hi Folks, One Question: I Have A Script That Looks Like:

Oh I see... for some reason I thought that all the dependencies of the environment would be tracked by ClearML, but it's only the ones that actually get imported...

If locally one detects that pandas is installed and can be used to read the csv, wouldn't it be possible to store this information in the clearml server so that it can be implicitly added to the requirements?

2 years ago

0 Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

I guess to achieve what I want, I could disable the agent using the helm chart values.yaml
and then define pods for each of the agent on their respective nodes

2 years ago

but I was a bit set off track seeing errors in the logs

2 years ago

Hi Jack, yes we had to customize the default one for some tools we use internally

2 years ago

0 Hi Folks, One Question: I Have A Script That Looks Like:

actually there are some network issues right now, I'll share the output as soon as I manage to run it

2 years ago

0 Hi Folks, One Question: I Have A Script That Looks Like:

Thanks a lot :)

2 years ago

I actually found out it was an indentation error 😅 and the credentials weren't picked

2 years ago

Exactly that :) if I go in the queue tab, I see a new queue name (that I didn't create),
with a name like "4gh637aqetc"

2 years ago

Thanks, in DM I sent you the conf we use to deploy the agents.

2 years ago

0 Hi Folks, One Question: I Have A Script That Looks Like:

sure, give me a couple of minutes to make the changes

2 years ago

0 Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

Thanks, I'll try to understand how the default agent coming with the helm chart is configured and try to copy how to setup a different one from there then

2 years ago

0 Hi All! Question Around Resource Management Using

Hi Martin, thanks for the explanation! I work with Maggie and help with the ClearML setup.

Just to be sure, currently, the PodTemplate contains:

resources: limits: nvidia.com/gpu: 1
you are suggesting to add also, something like:
requests: memory: "100Mi" limits: memory: "200Mi"is that correct?

On a related note, I am a bit puzzled by the fact that all the 4 GPUs are visible.
In the https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ , i...

2 years ago

and in the logs of the K8s Glue I see an exception occurred:

` No tasks in queue 54d3edb05a89462faaf51e1c878cf2c7
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 54d3edb05a89462faaf51e1c878cf2c7
No tasks in Queues, sleeping for 5.0 seconds
FATAL ERROR:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 710, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", l...

2 years ago

0 Hi All! Question Around Resource Management Using

Thanks Martin!

2 years ago

0 Hi Folks, One Question: I Have A Script That Looks Like:

the same that is available in the agent: - clearml==1.6.4

2 years ago

Before any experiment enqueueing, theare are the queue I have available

2 years ago

0 Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

Hi Martin, thanks. My doubt is:
if I configure manually the pods for the different nodes, how do I make clearml server aware that those agents exist? This step is really not clear to me from the documentation (it talks about user, and it uses interactive commands which would mean entering in the agents manually) I will try also the k8s glue, but I would like first to understand how to configure a fixed number of agents manually

2 years ago

cpu, and gpu are the names

2 years ago

At this point, I see a new queue in the UI:

2 years ago

Show more results