SubstantialElk6

117 Questions, 310 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

282 × Eureka!

Questions 117
Answers 310

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi, If I'Ve Clearml Agents Installed On Several Servers, Each With A Single Gpu. How Can I Train A Gpt2 Model That Would Require Multiple Gpus?

Hi, if i've ClearML agents installed on several servers, each with a single GPU. How can I train a gpt2 model that would require multiple GPUs?

clearml

one year ago

0 Votes

11 Answers

981 Views

0 Votes 11 Answers 981 Views

Hi, We Have Recurring Disk Space Issues On Our Clearml Server (Drop Of Many Gb In A Few Days). After Some Analysis, We Noted

Hi, we have recurring disk space issues on our ClearML server (Drop of many GB in a few days). After some analysis, we noted /opt/clearml/data/elastic_7 to b...

clearml

2 years ago

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

Hi, I Was Using The K8S Glue And It Worked Fine On One Project But Didn'T Work On Another. At The Point Just Before A Git Clone Was Executed, I Get The Error

Hi, i was using the K8S Glue and it worked fine on one project but didn't work on another. At the point just before a git clone was executed, i get the error...

clearml

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi, I Noticed That All Other Users Can See My Experiments. Does Clearml Has The Feasibility To Only Allow Certain Groups Of People To See Each Other'S Work?

Hi, i noticed that all other users can see my experiments. Does ClearML has the feasibility to only allow certain groups of people to see each other's work?

clearml

3 years ago

0 Votes

11 Answers

1K Views

0 Votes 11 Answers 1K Views

Hi, Can I Do A Quick Check If All The Documentation I Find On Trains Are Still Valid For Clearml? Specifically, I Am Looking At Integration Of Clearml And Kubernetes.

Hi, can i do a quick check if all the documentation I find on TRAINS are still valid for ClearML? Specifically, i am looking at integration of ClearML and Ku...

clearml

3 years ago

0 Votes

23 Answers

1K Views

0 Votes 23 Answers 1K Views

Hi I Saw This On The Clearml-Agent Docs But Other Than The Docker Image, I'M Not Sure How To Integrate This With Clearml Py And Clearml-Server. Please Advise.

Hi i saw this on the clearml-agent docs but other than the docker image, i'm not sure how to integrate this with clearml py and clearml-server. Please advise...

clearml

3 years ago

0 Votes

14 Answers

1K Views

0 Votes 14 Answers 1K Views

So I Bumped Onto This Comparison Shared By Dagshub. It Kinda Placed Clearml Is A Rather Bad Position Compared To Everything Else In The Industry.

So i bumped onto this comparison shared by dagshub. It kinda placed ClearML is a rather bad position compared to everything else in the industry. https://dag...

clearml

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Thought I Would Share This. Something To Think About Over The New Year.

Thought i would share this. Something to think about over the new year. 🙂 https://www.thoughtworks.com/content/dam/thoughtworks/documents/whitepaper/tw_whit...

clearml

2 years ago

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

Hi, I Am Trying To Use Clearml-Data To Upload My Data To S3, Which Is Password Protected. How Should I Indicate The Credentials After I Set --Storage S3://.... ?

Hi, i am trying to use clearml-data to upload my data to S3, which is password protected. How should i indicate the credentials after i set --storage s3://.....

dataset

3 years ago

0 Votes

12 Answers

1K Views

0 Votes 12 Answers 1K Views

Hi, I Am Running Several Python Scripts But All For The Same Project/Task. Is It Possible To Task.Init To Existing Running/Completed Task And Adding On The Results?

Hi, i am running several python scripts but all for the same project/task. Is it possible to Task.init to existing running/completed task and adding on the r...

clearml

3 years ago

0 Votes

15 Answers

1K Views

0 Votes 15 Answers 1K Views

Hi, I Noted That Clearml-Serving Does Not Support Spacy Models Out Of The Box And That Clearml-Serving Only Supports Following;

Hi, i noted that clearml-serving does not support Spacy models out of the box and that Clearml-Serving only supports following; Support Machine Learning Mode...

clearml

2 years ago

0 Votes

0 Answers

59 Views

0 Votes 0 Answers 59 Views

Hi Can I Ask How Clearml Support Distributed Training Via K8Sglue? Kubeflow Operator Support Distributed Training On Kubernetes Cluster, Managing The Pods Seamlessly.

Hi Can i ask how ClearML support distributed training via K8SGlue? Kubeflow Operator support distributed training on Kubernetes cluster, managing the pods se...

clearml

16 days ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi, How Can I Pass A Env Variable To The Docker That'S Running The Agent When I Run This? I'M Havving Issues With The Agent'S Git Clone Where It Requires Sslverification To Be Disabled. Clearml-Agent Daemon --Gpus 0 --Queue Gpu --Docker --Foreground

Hi, how can i pass a env variable to the docker that's running the agent when i run this? I'm havving issues with the agent's git clone where it requires ssl...

mlops

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

[Distributed Training] Hi, I Have A Clearml Setup With K8Sglue That Spins Up Pods Of 4 Gpus When Picking Tasks Off The Clearml Queue. We Would Now Want To Proceed With Multi-Node Training, And Some Of The Examples We Are Trying Are Here.

[Distributed Training] Hi, i have a ClearML setup with K8SGlue that spins up pods of 4 GPUs when picking tasks off the clearml queue. We would now want to pr...

clearml

one year ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hi, I Would Like To Ask Around If Anyone Has Following Languages Working With Clearml? It Can Be Direct From Clearml Sdk Or Via Any Indirect Method.

Hi, i would like to ask around if anyone has following languages working with ClearML? It can be direct from ClearML SDK or via any indirect method. Julia R ...

clearml

3 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, We Are Planning To Move On To Openshift. Can I Ask If K8S-Glue Supports Openshift?

Hi, we are planning to move on to openshift. Can I ask if k8s-glue supports openshift?

clearml

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi, Several Changes Occurred Recently And I Would Like To Know If There'S A Way To Verbose Catch All The Printout That Happening Within A K8S Glue Spawned Pod. We Have An Issue Where All Of Our New Remote_Execution Tasks Are Stuck In The 'Pending' Stage.

Hi, several changes occurred recently and i would like to know if there's a way to verbose catch all the printout that happening within a k8s glue spawned po...

mlops

3 years ago

0 Votes

0 Answers

611 Views

0 Votes 0 Answers 611 Views

Hi, Is There A Way To Export Clearml Experiments Into A File Package And Import Them On Another Clearml Instance?

Hi, is there a way to export ClearML experiments into a file package and import them on another ClearML instance?

clearml

10 months ago

0 Votes

11 Answers

1K Views

0 Votes 11 Answers 1K Views

Hi, I Shifted My Clearml Setup To An On-Premise Disconnected Env, Which Has A Pip Repo Setup. I Noted This Warning,

Hi, i shifted my clearml setup to an on-premise disconnected env, which has a pip repo setup. I noted this warning, Trying pip install: /root/.clearml/venvs-...

pytorch

3 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi, I'Ve A Few Questions On Clearml-Session.

Hi, I've a few questions on clearml-session. We will be running some GUI applications so is it possible to forward the GUI to the clearml-session? We have a ...

mlops

3 years ago

0 Votes

22 Answers

1K Views

0 Votes 22 Answers 1K Views

Hi, I Would Like To Pass In Some Pip Arguments That Clearml-Agent Would Include When Setting Up The Venv On The Containers. How Should I Specify This? The Argument In Question Are --Trusted-Host And --Find-Links . I Need Them As I'Ve Installed A Pypi Repo

Hi, I would like to pass in some pip arguments that clearml-agent would include when setting up the venv on the containers. How should I specify this? The ar...

clearml

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi, Would Like To Check. So An Agent Pulled A Docker Image And Install The Pip Dependencies On It. What If I Have Os Library Dependencies As Well? (Apt Install, Rpm Install...Etc).

Hi, would like to check. So an agent pulled a docker image and install the pip dependencies on it. What if I have OS library dependencies as well? (Apt insta...

mlops

3 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, Trying To Understand Clearml-Session. I Have An Agent Running On A Machine Monitoring A Queue Then I Ran Clearml-Session --Queue Myqueu --Docker Torch-Image. The Clearml Session Ended Up Tunneling Into The Physical Machine That My Agent Is Running

Hi, trying to understand clearml-session. I have an agent running on a machine monitoring a queue Then I ran clearml-session --queue myqueu --docker torch-im...

mlops remote-ssh

3 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi We Have Had Some Crashes On Clearml Server And It Was Caused By Clearml Uploading The Models Into Clearml Server (By Default). Is It Possible To Have An Overriding Config So Clients Can Never Upload To Clearml Server Itself As Default?

Hi we have had some crashes on ClearML server and it was caused by ClearML uploading the models into ClearML server (by default). Is it possible to have an o...

clearml

2 years ago

0 Votes

1 Answers

914 Views

0 Votes 1 Answers 914 Views

Hi. For The Experiment Scalar Tab, There'S A Gpu Resource Graph. The Gpu Mem Used Is In Percentage, Is It Possible To Display As Absolute Gb Instead? Reason Is Because The User Doesn'T Really Know How Much Vram Is Allocated.

Hi. For the experiment scalar tab, there's a gpu resource graph. The gpu mem used is in percentage, is it possible to display as absolute GB instead? Reason ...

clearml

one year ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, I Notice A New Behavuour With Clearml-Agent=1.1.0. When It Is Installing The Packages I Nrequirements.Txt, It Failed With.

Hi, i notice a new behavuour with clearml-agent=1.1.0. When it is installing the packages i nrequirements.txt, it failed with. clearml_agent: ERROR: HTTPSCOn...

clearml

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, I Am Working On Creating Retraining Pipelines In Production. The Way I'M Doing This Is To Install Clearml-Server On My Production. Then I Recreate The Ingestion, Preprocessing And Training/Opt Tasks Into A Clearml-Pipeline. Thereafter, I Would Call

Hi, i am working on creating retraining pipelines in production. The way i'm doing this is to install clearml-server on my production. Then i recreate the in...

clearml

2 years ago

Show more results

0 Prev, I Worked With Clearml (1 Year Back) And Back Then, We Config Seldon Core For The Deployment And Clearml For The Training.. Now There Is Clearml-Serving, Does It And Can It Fulfill A Similar Objective ?

Hi, i'm gonna hijack this thread a bit. My community uses ClearML and is looking at various model deployment strategies. We are looking at a seamless integration with Triton but noted they Triton does not support deployment strategies. ClearML-Serving seems to but the strategies are rather limited. Is there a roadmap to expand Clearml-serving?

2 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Do you have more info on vault?
Actually it only make sense if the entire department or organisation are saving their models in a common repo. In our case this is not possible due to client security (e.g. training data from clients can potentially be 'reverse engineered' from trained models in future). So each department and even projects will need their own repo.

3 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Hi, we are still not getting the model repo to work, mainly due to clearml.storage failing to save the models.
We tried a vanilla boto3 code and it works, but we can't figure out why we get connectionreseterror 104 when clearml does it.

How do we configure clearml in correspondence to following boto code?

S3= boto3.resource('s3', endpoint_url=' https://ecs.ai ', aws_access_key_id='mykey', aws_secret_access_key='mysevret', config=Config(signature_version='s3v4'), region_name='us-east-1', ve...

3 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Ok thanks. that explains alot. We have been doing this wrongly the whole time, thinking that the clearml.conf on the client side would be acknowledged by the remote agent execution. In reality, only the API section is utilised.

3 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

i see. Can i take it that when the client uses
task.execute_remotely(queue_name="1gpu", exit_process=True)then none of the content in its clearml.conf will be used, except for the API part. And Clearml simply uses whatever is on the Agent side.
api { # Notice: 'host' is the api server (default port 8008), not the web server. api_server: web_server: files_server: # Credentials are generated using the webapp, `
# Override with os environment: ...

3 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Going back to the open source, I think that adding the credentials as part of the source code might allow to have "credentials" auto populate as part of the remote execution, wdyt?

Not sure how this will work when i can't supply the credentials to ClearML programatically.

3 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

yes its on purpose, each user would have their own AWS credentials for default_output_uri.

3 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

I thought of another potential way but not sure if the SDK supports it.
We will perform manual save and upload of model using vanilla boto3 and credentials passed in as env var. Use ClearML SDK to update the Model Repo on the location of the model, without ClearML uploading it explicitly.Would the above work?

3 years ago

0 Hi, We Recently Upgraded Clearml To 1.1.1-135 . 1.1.1 . 2.14. The Task Init Is

I didn't track the version on this change in behaviour. But last I tried it was able to download the content after I provide the credentials.

3 years ago

0 Hi, We Recently Upgraded Clearml To 1.1.1-135 . 1.1.1 . 2.14. The Task Init Is

Hi,
I'm running on Dell ECS storage appliance, which offers S3 compatibility.
yes http://ECS.ai is the DNS name of the server.
ClearML-models is the bucket.
Let me try with ip:port.

3 years ago

0 Hi, My Devsecops Team Has Raised Some Issues Of Us Deploying Clearml For Use. In Particular, They Are Not Happy With Docker.Sock Configuration As It Would Potentially Expose The Entire Cluster To Unauthorised View. Can We Do Without It?

Thanks, its attached.
I also noted that the status on the ClearML is always in 'pending', unlike others which says 'Running'. Is this a side effect of using k8s glue?

3 years ago

Hi thanks. How about Agent, does its docker mode or k8s mode require docker.sock to be exposed?

3 years ago

Hi, please correct me if i am wrong, to use the glue, i need the following.
A k8s cluster A kubectl that is connected to the k8s cluster A pip install of clearml-agent 0.17.1
So i did all the above, I'm not what it meant by running the entire thing on own machine.

3 years ago

Unfortunately it's not. The problem previously encountered with the docker method surfaced again. In this case, the BASE DOCKER IMAGE
nvidia/cuda:10.1-runtime-ubuntu18.04 --env GIT_SSL_NO_VERIFY=true is not taking effect with the k8s glue.

3 years ago

Is this fix coming soon?

3 years ago

Thanks 👍 . Should i create an issue on Github?

3 years ago

Sorry i forgot to paste the logs.

3 years ago

It has always been there.

3 years ago

0 Hi I'M Using Clearml Datasets. How Do I Tell From The Clearml Ui Which Datasets Version Am I Using?

I meant the dataset id.

3 years ago

0 Hi I'M Using Clearml Datasets. How Do I Tell From The Clearml Ui Which Datasets Version Am I Using?

Hi, it make sense to automate this part just like how you automate the rest of the MLOps flow, especially when you already support Data Versioning/Lineage, Data Provenance (How it works with the experiment and as a model source) should be in too. Although i agree technically it's probably not possible to tell if the users actually used the indicated datasets after they do a datasets.get_copy() .

3 years ago

0 Hi I'M Using Clearml Datasets. How Do I Tell From The Clearml Ui Which Datasets Version Am I Using?

AgitatedDove14 , i'm Jax, not Manoj! lol. 😅 😅

3 years ago

0 Hi I'M Using Clearml Datasets. How Do I Tell From The Clearml Ui Which Datasets Version Am I Using?

Sorry AgitatedDove14 can you bump me to that thread?

3 years ago

0 Hi, Several Changes Occurred Recently And I Would Like To Know If There'S A Way To Verbose Catch All The Printout That Happening Within A K8S Glue Spawned Pod. We Have An Issue Where All Of Our New Remote_Execution Tasks Are Stuck In The 'Pending' Stage.

does the bash script need clearml-agent to be able to communicate to the https clearml-server first? If yes, there's a chicken/egg problem here.

3 years ago

Sorry, in case i misunderstood you. Are you refering to the extra_docker_shell_script .

3 years ago

Some breakthrough. The problem is because we switched the web, api and files server to use https (ssl) endpoint instead. I had switched back to http end points to test this theory.

Although its not printing the error, i suspect its not able to connect due to lack of the self signed cert. Previously this wasn't an issue, not sure what changed in clearml_agent=1.1.0.

There's a secondary issue resulting, i will put this on a new thread.

3 years ago

Its running as a long running POD on K8S. I'm using log -f to track its stdout.

3 years ago

Is there a way for k8s glue to pass on self signed cert information to the agent pods?

3 years ago

Ok i get the logic now. extra_docker_shell_script executes before clearml-agent talks to clearml server.

3 years ago

0 Can I Ask How Often Does The Hosted Clearml Reset? I'M In A Hackathon And Thought Of Using It.

ok thanks.

3 years ago

0 Hi We Have Had Some Crashes On Clearml Server And It Was Caused By Clearml Uploading The Models Into Clearml Server (By Default). Is It Possible To Have An Overriding Config So Clients Can Never Upload To Clearml Server Itself As Default?

Hi SuccessfulKoala55 , can i confirm the following comments in the docker-compose.yml ?
And after that to run docker-compose commands without loss of data?

docker-compose down docker-compose up
docker-compose.yml
`
version: "3.6"
services:

apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/config:/opt/clearml/config
#...

2 years ago

Show more results