SubstantialElk6

117 Questions, 310 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

282 × Eureka!

Questions 117
Answers 310

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

[Distributed Training] Hi, I Have A Clearml Setup With K8Sglue That Spins Up Pods Of 4 Gpus When Picking Tasks Off The Clearml Queue. We Would Now Want To Proceed With Multi-Node Training, And Some Of The Examples We Are Trying Are Here.

[Distributed Training] Hi, i have a ClearML setup with K8SGlue that spins up pods of 4 GPUs when picking tasks off the clearml queue. We would now want to pr...

clearml

one year ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi, Would Like To Check. So An Agent Pulled A Docker Image And Install The Pip Dependencies On It. What If I Have Os Library Dependencies As Well? (Apt Install, Rpm Install...Etc).

Hi, would like to check. So an agent pulled a docker image and install the pip dependencies on it. What if I have OS library dependencies as well? (Apt insta...

mlops

3 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

I Got An Interesting Question From My Devs. If They Wish To Do Distributed Training, Is Clearml K8S Glue Suitable For It? Local Multiple Gpu: Just A Matter Of Assigning More Than One Gpu In The Yaml File Sent To The K8S Glue. Question Is How To Make This

I got an interesting question from my Devs. If they wish to do distributed training, is clearml k8s glue suitable for it? Local multiple GPU: just a matter o...

clearml

3 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hi, I Would Like To Ask Around If Anyone Has Following Languages Working With Clearml? It Can Be Direct From Clearml Sdk Or Via Any Indirect Method.

Hi, i would like to ask around if anyone has following languages working with ClearML? It can be direct from ClearML SDK or via any indirect method. Julia R ...

clearml

3 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, We Are Planning To Move On To Openshift. Can I Ask If K8S-Glue Supports Openshift?

Hi, we are planning to move on to openshift. Can I ask if k8s-glue supports openshift?

clearml

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi, Several Changes Occurred Recently And I Would Like To Know If There'S A Way To Verbose Catch All The Printout That Happening Within A K8S Glue Spawned Pod. We Have An Issue Where All Of Our New Remote_Execution Tasks Are Stuck In The 'Pending' Stage.

Hi, several changes occurred recently and i would like to know if there's a way to verbose catch all the printout that happening within a k8s glue spawned po...

mlops

3 years ago

0 Votes

0 Answers

611 Views

0 Votes 0 Answers 611 Views

Hi, Is There A Way To Export Clearml Experiments Into A File Package And Import Them On Another Clearml Instance?

Hi, is there a way to export ClearML experiments into a file package and import them on another ClearML instance?

clearml

10 months ago

0 Votes

11 Answers

1K Views

0 Votes 11 Answers 1K Views

Hi, I Shifted My Clearml Setup To An On-Premise Disconnected Env, Which Has A Pip Repo Setup. I Noted This Warning,

Hi, i shifted my clearml setup to an on-premise disconnected env, which has a pip repo setup. I noted this warning, Trying pip install: /root/.clearml/venvs-...

pytorch

3 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi, I'Ve A Few Questions On Clearml-Session.

Hi, I've a few questions on clearml-session. We will be running some GUI applications so is it possible to forward the GUI to the clearml-session? We have a ...

mlops

3 years ago

0 Votes

0 Answers

892 Views

0 Votes 0 Answers 892 Views

Hi, We Are Encountering An Increasing Number Of Cases Where It Takes Quite A While Before Actual Training (Gpu Utilisation) Can Be Done. After Observing, This Is What We Discovered. The Following Are The Steps And Bottlenecks.

Hi, we are encountering an increasing number of cases where it takes quite a while before actual training (GPU utilisation) can be done. After observing, thi...

clearml

one year ago

0 Votes

1 Answers

914 Views

0 Votes 1 Answers 914 Views

Hi. For The Experiment Scalar Tab, There'S A Gpu Resource Graph. The Gpu Mem Used Is In Percentage, Is It Possible To Display As Absolute Gb Instead? Reason Is Because The User Doesn'T Really Know How Much Vram Is Allocated.

Hi. For the experiment scalar tab, there's a gpu resource graph. The gpu mem used is in percentage, is it possible to display as absolute GB instead? Reason ...

clearml

one year ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, I Notice A New Behavuour With Clearml-Agent=1.1.0. When It Is Installing The Packages I Nrequirements.Txt, It Failed With.

Hi, i notice a new behavuour with clearml-agent=1.1.0. When it is installing the packages i nrequirements.txt, it failed with. clearml_agent: ERROR: HTTPSCOn...

clearml

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, I Am Working On Creating Retraining Pipelines In Production. The Way I'M Doing This Is To Install Clearml-Server On My Production. Then I Recreate The Ingestion, Preprocessing And Training/Opt Tasks Into A Clearml-Pipeline. Thereafter, I Would Call

Hi, i am working on creating retraining pipelines in production. The way i'm doing this is to install clearml-server on my production. Then i recreate the in...

clearml

2 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Hi, We Are Having Issues With Clearml-Session For Vscode. Apparently It'S Hardcoded To Download From

Hi, we are having issues with clearml-session for vscode. Apparently it's hardcoded to download from https://github.com/microsoft/vscode-python/releases but ...

remote-ssh

3 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi We Have Had Some Crashes On Clearml Server And It Was Caused By Clearml Uploading The Models Into Clearml Server (By Default). Is It Possible To Have An Overriding Config So Clients Can Never Upload To Clearml Server Itself As Default?

Hi we have had some crashes on ClearML server and it was caused by ClearML uploading the models into ClearML server (by default). Is it possible to have an o...

clearml

2 years ago

0 Votes

29 Answers

1K Views

0 Votes 29 Answers 1K Views

Hi, I Started My Agent Using. Clearml-Agent Daemon --Gpus 0 --Queue Gpu --Docker --Foreground, With The Following Parameters In Clearml.Conf.

Hi, I started my agent using. clearml-agent daemon --gpus 0 --queue gpu --docker --foreground, with the following parameters in clearml.conf. default_docker:...

mlops

3 years ago

0 Votes

14 Answers

1K Views

0 Votes 14 Answers 1K Views

So I Bumped Onto This Comparison Shared By Dagshub. It Kinda Placed Clearml Is A Rather Bad Position Compared To Everything Else In The Industry.

So i bumped onto this comparison shared by dagshub. It kinda placed ClearML is a rather bad position compared to everything else in the industry. https://dag...

clearml

3 years ago

0 Votes

22 Answers

1K Views

0 Votes 22 Answers 1K Views

Hi, I Would Like To Pass In Some Pip Arguments That Clearml-Agent Would Include When Setting Up The Venv On The Containers. How Should I Specify This? The Argument In Question Are --Trusted-Host And --Find-Links . I Need Them As I'Ve Installed A Pypi Repo

Hi, I would like to pass in some pip arguments that clearml-agent would include when setting up the venv on the containers. How should I specify this? The ar...

clearml

3 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Hi, Just To Check. Does The K8S Glue Install Torch By Default? I'M Getting

Hi, just to check. Does the k8s glue install torch by default? I'm getting Warning: could not resolve python wheel replacement for torch==1.8.0 even though i...

tensorflow

3 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi, If I'Ve Clearml Agents Installed On Several Servers, Each With A Single Gpu. How Can I Train A Gpt2 Model That Would Require Multiple Gpus?

Hi, if i've ClearML agents installed on several servers, each with a single GPU. How can I train a gpt2 model that would require multiple GPUs?

clearml

one year ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi Recently Upgraded All The Clearml, Clearml-Server, Clearml-Agent. Now Running K8S Glue With Clearml-Agent=1.0.1Rc1.

Hi recently upgraded all the clearml, clearml-server, clearml-agent. Now running k8s glue with clearml-agent=1.0.1rc1. python3 k8s_glue_example.py --queue 1b...

clearml

3 years ago

0 Votes

8 Answers

983 Views

0 Votes 8 Answers 983 Views

I Just Getting This In My Agent Run Task. Would Appreciate If Someone Can Advise Where I Externalrequirement Is Pointing At.

I just getting this in my agent run task. Would appreciate if someone can advise where i externalrequirement is pointing at. RequirementsManager handler rais...

mlops

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi, I'M Running Clearml Agents Via K8S Glue. I Noticed That The Agent Is Not Pulling Latest Images Even Though

Hi, I'm running clearml agents via K8s glue. I noticed that the agent is not pulling latest images even though docker_force_pull is set to true. A kubectl de...

mlops

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Thought I Would Share This. Something To Think About Over The New Year.

Thought i would share this. Something to think about over the new year. 🙂 https://www.thoughtworks.com/content/dam/thoughtworks/documents/whitepaper/tw_whit...

clearml

2 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, Trying To Understand Clearml-Session. I Have An Agent Running On A Machine Monitoring A Queue Then I Ran Clearml-Session --Queue Myqueu --Docker Torch-Image. The Clearml Session Ended Up Tunneling Into The Physical Machine That My Agent Is Running

Hi, trying to understand clearml-session. I have an agent running on a machine monitoring a queue Then I ran clearml-session --queue myqueu --docker torch-im...

mlops remote-ssh

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi, In Your Latest Changelog. There'S A New Function.

Hi, in your latest changelog. There's a new function. Task.launch_multi_node() for distributed experiment execution In the context of using with K8S glue, wi...

clearml

one year ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hi, We Are Working On A Mini Project To 'Integrate' Clearml Datasets With Ckan. Wondering If The Community Could Share Some Ideas.

Hi, we are working on a mini project to 'integrate' ClearML Datasets with CKAN. Wondering if the community could share some ideas.

clearml

2 years ago

Show more results

0 Hi We Have Had Some Crashes On Clearml Server And It Was Caused By Clearml Uploading The Models Into Clearml Server (By Default). Is It Possible To Have An Overriding Config So Clients Can Never Upload To Clearml Server Itself As Default?

Hi SuccessfulKoala55 , would they need the fileserver to route to minio then? E.g.

This will ensure that any actions by clearml-data and models are saved into the S3 object store.
api {
files_server: s3://ecs.ai:80/clearml-data/default
}

aws {
s3 {
credentials {
host: http://ecs.ai:80
## Insert the iam credentials provided by your SAs here.
}
}
}

But if user forgot to do above, they will be saved on ClearML server. If I switch off f...

2 years ago

0 Hi, Several Changes Occurred Recently And I Would Like To Know If There'S A Way To Verbose Catch All The Printout That Happening Within A K8S Glue Spawned Pod. We Have An Issue Where All Of Our New Remote_Execution Tasks Are Stuck In The 'Pending' Stage.

Hi, i dont't think clearml agent actually ran at that point in time. All i can see in the pod is
apt install of libpthread-stubs, libx11, libxau and libxcb1 packages. pip install of clearml-agentAfter the above are successful, the pod just hang there.

3 years ago

I want to rule out the glue being the problem. Is the Glue significant in initialising clearml-agent after the pod is spawned?

3 years ago

I have since ruled out the apt and pypi repos. Both of them are installing properly on the pods.

3 years ago

0 Can I Ask How Often Does The Hosted Clearml Reset? I'M In A Hackathon And Thought Of Using It.

Hi, is this currently not working? http://app.community.clear.ml ? I noticed that cleaml UI will cache on the browser and if the backend is not running, its not clear to user that something is wrong (except for broken pages).

3 years ago

0 Can I Ask How Often Does The Hosted Clearml Reset? I'M In A Hackathon And Thought Of Using It.

Clearing the cache entirely works. Thanks.

3 years ago

0 Hi, Can I Ask How I Can Make Clearml-Datasets In Comparison With Pytorch Datasets/Dataloader? In Particular, Pytorch Dataloaders Would Be Able To Batch Pull And Then Preprocess Data Using Multi-Cpus, Feed It Into The Training Loop And Achieve As High Util

Thanks CostlyOstrich36 , how do i know how is the parts indexed in the first place? Or rather, how is chunk and parts defined? Say in the context of images, videos, text documents...etc.

2 years ago

Although I think you can also pull specific chunks of dataset

How do you do that with clearml-data?

2 years ago

0 Hi, I Notice A New Behavuour With Clearml-Agent=1.1.0. When It Is Installing The Packages I Nrequirements.Txt, It Failed With.

thanks.

3 years ago

0 Hi, We Have Recurring Disk Space Issues On Our Clearml Server (Drop Of Many Gb In A Few Days). After Some Analysis, We Noted

Thanks SuccessfulKoala55 , how might I do this clean up? Does this increase with more use of ClearML? And to add, we save all artifacts onto a remote S3 server.

2 years ago

0 Hi, I Shifted My Clearml Setup To An On-Premise Disconnected Env, Which Has A Pip Repo Setup. I Noted This Warning,

Hi AgitatedDove14 , what version i should change it to? I'm currently on v0.17.2rc3.

3 years ago

0 Hi, I Shifted My Clearml Setup To An On-Premise Disconnected Env, Which Has A Pip Repo Setup. I Noted This Warning,

AgitatedDove14 , would you elaborate on this resolution process?

3 years ago

0 Hi, I Have A Scenario Where When The Code Is Run Remotely Via Clearml-Agent, The Code Appears To Get Stuck At

Is there anyway to see an error log from that?

one year ago

0 Hi, Trying To Understand Clearml-Session. I Have An Agent Running On A Machine Monitoring A Queue Then I Ran Clearml-Session --Queue Myqueu --Docker Torch-Image. The Clearml Session Ended Up Tunneling Into The Physical Machine That My Agent Is Running

Hi, I was expecting to see the container rather then the actual physical machine. For example, in the file panel on the left of the jupyter panel, I see the file contents of the physical machine. I was expecting this to be the container.

3 years ago

Hi it is missing --docker on the agent. Thanks! Dynamic GPU option only available with Enterprise version right?

3 years ago

0 Hi, V1 Of Agent Seems To Have Removed Agent.Package_Manager.Force_Repo_Requirements_Txt. Is This Still Available In Other Forms?

I managed to find out why. The docker image I'm using is not set as root user thus the error. But I'm wondering why this is the case as docker best practices does indicate we should use a non root on production images.

3 years ago

0 Hi, How Might I Use The Sdk To Pull Parameters Of The Agent'S Clearml.Conf Into My Code During Runtime? For Example, If I Wish To Pull The Configuration For Aws.S3.Credentials.Key And Aws.S3.Credentials.Secret?

Yup that works. Thanks.

3 years ago

0 Hi, V1 Of Agent Seems To Have Removed Agent.Package_Manager.Force_Repo_Requirements_Txt. Is This Still Available In Other Forms?

Its actually in your documentation. Its removed since 0.17 apparently.
https://allegro.ai/clearml/docs/docs/release_notes/ver_0_17.html#clearml-agent-0-17-2

And this is my logs, it tried to install something and encountered permission denied. It wouldn't if it obeyed the force_repo_requirements_txt.

1620664917916 Kahs-MacBook-Pro.local info ClearML Task: created new task id=024a421c0e174650a1c7ff64af756c26 ClearML results page: `
1620664920359 Kahs-MacBook-Pro.local info ClearML Mon...

3 years ago

0 Hi, We Are Having An Interesting Issue Here. We Serve Many Users And Each User Has Their Own Credentials In Accessing The Private Git Repo. We Can'T Seem To Find A Way For The End User To Pass In Their Git Credentials When They Run Their Codes In Both Age

Hi, just wondering if this 'feature: Passing env via the code' is in the works?
https://clearml.slack.com/archives/CTK20V944/p1616677400127900?thread_ts=1616585832.098200&cid=CTK20V944

3 years ago

The apply.yaml template is not working (E.g. the arguments env is not passed to the container), this is why i tried the code approaach instead.

3 years ago

0 Hi, V1 Of Agent Seems To Have Removed Agent.Package_Manager.Force_Repo_Requirements_Txt. Is This Still Available In Other Forms?

so the clearml-agent daemon needs higher privilege?

3 years ago

alright. 🙂

3 years ago

Hi FriendlySquid61 , AgitatedDove14 , the issue and possible fix is in this issue raise. https://github.com/allegroai/clearml-agent/issues/51

3 years ago

No i didn't indicate this particular issue on the git issue. Only the apply template.yml is on the issue.

3 years ago

AgitatedDove14 , will these be fixed?
Passing env via the code Passing env via template yaml

3 years ago

0 Hi, V1 Of Agent Seems To Have Removed Agent.Package_Manager.Force_Repo_Requirements_Txt. Is This Still Available In Other Forms?

yup. in this case it wasn't root. Removing that USER and -u in pip solves the problem. However, in our production images, we are required to remove root access.
` FROM nvidia/cuda:10.1-cudnn7-devel

ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y
python3-opencv ca-certificates python3-dev git wget sudo ninja-build
RUN ln -sv /usr/bin/python3 /usr/bin/python

create a non-root user

ARG USER_ID=1000
RUN useradd -m --no-log-init --system --uid ${USER_ID} a...

3 years ago

Hi AgitatedDove14 . I'm trying out passing env via the code instead.
task.set_base_docker("nvcr.io/nvidia/tensorflow:19.11-tf2-py3 --env TRAINS_AGENT_GIT_USER=git_username_here --env TRAINS_AGENT_GIT_PASS=git_password_here")So the strange thing is when my k8sglue pulls a task, this happens.
` Pulling task xxxxxxxxxx launching on kubernetes cluster
Pushing task xxxxxxxxxx into temporary pending queue
Kubernetes scheduling task id=xxxxxxxxxxxx
skipping docker argument TRAINS_AGENT_GIT_USE...

3 years ago

0 Hi, We Are Using Gitlab And It Is A Security Requirement To Use Ssh Keys To Access The Repos For Each Individual. We Are Also Using K8S Glue. Is There Any Provisions To Do This Seamlessly?

what feature on this paid roadmap are you referring to? I am indeed communicating with Noem on paid features.

3 years ago

0 Hi, Can I Choose Not Print The Clearml-Agent Config Logs In The Console? Reason Is We Are Passing Credentials Via Env Var To The K8S Glue And Its Being Displayed In The Console As ...

Hi, any advice on this? thanks.

3 years ago

0 Hi, We Are Using Gitlab And It Is A Security Requirement To Use Ssh Keys To Access The Repos For Each Individual. We Are Also Using K8S Glue. Is There Any Provisions To Do This Seamlessly?

Hi, scenario as follows.

client.py runs task.execute_remotely(queue='myqueue', exit_process=True) The API section of clearml.conf at client side is read in. client side calls clearml server and insert task into queue. K8S glue retrieves task from queue. Spawn a K8S pod. K8S pod performs git clone Error. ssh keys not found.
Each individual has their own key in the gitlab profile and gitlab is configured to only work via ssh.
We can't place the key in the image as this is as good as ...

3 years ago

Show more results