DeliciousBluewhale87

38 Questions, 126 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

121 × Eureka!

Answers 126

0 Hi, I Have Quite A Generic Question. Basically, I Am Picking Your Brains For Any Solution. Our Current Pipeline Has (Clearml-Data, Clearml And Seldon). We Were Looking For Some Workflow Orchestrator To Stitch Them Up. One Scenario:

Ah, so in the future, we can add non-clearml code as a step in the pipeline controller.

4 years ago

0 Hi, Expanding On

Sound Perfect. 👍

4 years ago

0 Hi, We Have Clearml On K8 Setup. Using The Below, We Run Dynamic Pods On The Cluster.

So now you don’t have any failures but gpu usage issue?

I didnt run the hyper_parameter_optimzer.py, as I was thinking if there is already a problem with the base, no use with running the series of experiments

How about running the ClearML agent in docker mode?

Prev, we had our clearml-agent run in the bare-metal machine instead in docker formation. There wasnt any issue.. Though I havent tried with 0.17.2 version

4 years ago

0 Hi, We Have Clearml On K8 Setup. Using The Below, We Run Dynamic Pods On The Cluster.

let me run the clearml-agent outside the k8 system.. and get back to u

4 years ago

0 Hi, We Have Clearml On K8 Setup. Using The Below, We Run Dynamic Pods On The Cluster.

` Could not load dynamic library 'libcupti.so.11.0'; dlerror: libcupti.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-03-11 09:11:17.368793: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcupti.so'; dlerror: libcupti.so: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-03-11 09...

4 years ago

0 Hi, We Have Clearml On K8 Setup. Using The Below, We Run Dynamic Pods On The Cluster.

Hi TimelyPenguin76 ,
Instead of running the hyper_parameter_optimizer.py, I tried running the base_template_keras_simple.py instead.. It seems that I didnt use the GPU, however when i ssh into clearml-glueq-id-ffaf55c984ea4dbfb059387b983746ba:gpuall pod, and ran nvidia-smi, it gave an output.

4 years ago

0 Hi, I Am Trying To Understand Clearml-Data And Only Found This Piece Of Article Explaining It.

Hi, Some walk around I thought of.. Btw, I havent tried . AnxiousSeal95 , your comments

1 ) Attach a clearml-task id to each new dataset-id
So in the future, when new data comes in, get the last data commit from the project(Dataset) and get the clearml-task for it. Then clone the clearml-task, and pass in the new data. The only downside, is the need to clone the cleaml-task.
Or alternatively
2) Attach a gitsha-id of the processing code to each new dataset-id.
This can't give the exact code ...

4 years ago

0 Hi , I Have This Use Case.

kkie.. I have two differenet projects under clearml web server.
First project , stores datasets only.. using clearml-data (PROJ_1) Second project, is a clearml-pipeline project, (PROJ_2) which pulls the latest commited dataset from (PROJ_1) and does few other steps ... Now, I manually start the PROJ_2 when i know the dataset is updated in PROJ_1.

4 years ago

One use case now :
Load Data from Label Studio (Manager to manually approve) Push data to Clearml-data Run Training (Manager to manually Publish) Pushes model uri to next step Seldon deploy itLater, if seldon detects a data drift, it will automatically run (steps 2-5)..
At this point, we havent drilled all of it down yet

4 years ago

0 Hi Everyone, Yesterday I Pushed An Experiment To The

Btw, this is just the example code from clearml repo..

4 years ago

0 Using The S3 To Push Data, I Tried This Snippet.. How To Add In The Key And Secret Key ? I See That It Requires Another Class, Storagehelper, But I Wasnt Able To Find A Code Snippet

Ah kk.. Got it. Thanks..

4 years ago

0 I Just Deployed Clearml Into K8 Cluster Using Clearml Helm Package. When I Ran A Job, It Gave This Error In The Clearml Web Server (Attached Below). I Sshed Into The Pod Running The Clearml-Agent. Upon Typing Clearml-Agent Init, I Realised The Clearml.Con

Yup, i used the value file for the agent. However, i manually edited for the agentservices (as there was no example for it in the github).. Also I am not sure what is the CLEARML_HOST_IP (left it empty)

4 years ago

0 Hi Everyone, Yesterday I Pushed An Experiment To The

Hmm, unfortutenly it is still pending as in nothing is running

4 years ago

0 I Have Used Aws S3 And Minio As Storage For Clearml Artifacts. But Has Anyone Used Nexus As A Storage ?

CostlyOstrich36 :
They mentioned that they already have a Nexus backend. So just was wondering if we could use it for storage purposes.

4 years ago

0 I Have Used Aws S3 And Minio As Storage For Clearml Artifacts. But Has Anyone Used Nexus As A Storage ?

In our local setup, we use minio though ?

4 years ago

0 Hi Folks, We Are Trying To Find A Tool To Help With Workflow Orchestration. This Is Our Stack So Far (Label Studio/Clearml/Seldon). Does Anyone Have Any Experience With Using Any Workflow Which Is Most Compatible Esp Wrt To Clearml.

AgitatedDove14 Not creating but more for orchestrating...

Currently, we manually push a dataset to cleaml-dataset .
Have a pipeline controller Task which (takes in data from clearml-dataset, runs preprocessing, runs training) and Publishes a model (if certain threshold is met).
We have clearml monitor which will monitor all Published models .It will push the uri of the published model to a rabbitmq.

We have a subscriber (python code) listening to the rabbitmq. This takes in the uri from t...

4 years ago

0 Hi , I Have This Use Case.

MagnificentSeaurchin79 How to do this ? Can it be done via ClearMl itself ?

sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline

4 years ago

0 Does Clearml Has A Webhook Mechanism ? Example, When A Training Job Is Completed.. There Is Notification Raised So Can Proceed To Do Deployment Etc ...

Is this some sort of polling ?
End of the day, we are just worried whether this will hog resources compared to a web-hook ? Any ideas

4 years ago

0 Hi, I Am Running A File Like This

Hi AgitatedDove14 , This isnt the issue. With or without specifying the queue, I have this error when I do the "Create version" as compared to the "Init version".
I wonder whether this is some issue with using the Create version together with execute_remotely() ..

4 years ago

0 I Have Used Aws S3 And Minio As Storage For Clearml Artifacts. But Has Anyone Used Nexus As A Storage ?

we can always use the latest Clearml.
We were thinking of a use case for a client, who has Sonatype Nexus in their environment ? Could we leverage on it, or would we need minio instead ?

4 years ago

Something is wierd.. It is showing workers which are not running now...

4 years ago

0 Clearml Server For Kubernetes Clusters Using Helm

Hi, for the values.yaml, is there some reference for it esp so , if we assign more Memory to webserver service etc. I tried googling around but so far no luck

4 years ago

This is from my k8 cluster. Using the clearml helm package, I was able to set this up.

4 years ago

0 Quick Qn, When Using The Clearml-Task, How To Specify The Output_Uri.

i ran this in my local machine..
clearml-task --project playground --name tensorboard_toy --script tensorboard_toy.py --requirements requirements.txt --queue myqueue

4 years ago

0 Base_Template_Keras_Simply.Py

Just figured out..
Seems like the docker image below, didnt have tensorflow package.. 😮
tensorflow/tensorflow:latest-devel-gpuI shld have checked prior... My Bad..
Thanks for the help

4 years ago

Hi martin, i just untemplate-ed the
helm template clearml-server-chart-0.17.0+1.tgzI found this lines inside.
- name: CLEARML_AGENT_DOCKER_HOST_MOUNT value: /opt/clearml/agent:/root/.clearmlUpon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there.. So the mounting worked, it seems.
I am not sure, I get your answer. Should i change the values to something else ?
Thanks

4 years ago

0 Hi, I Am Running A Pipeline (Which Does Preprocessing And Training) ? Once Training Ends, I Want To Automatically Publish The Task (Model). Reading The Docs, I Tried This Approach Below. I Wrote A

4 years ago

0 Hi, I Am Running A File Like This

Hi AgitatedDove14 , Attached my create version compared to init version..
When I enqueue both the init and create version into my clearmlQueue, it seems the create version doesnt execute at all.
It just mentions "2021-05-26 16:02:13,053 - clearml - WARNING - Terminating local execution process" and says it has completed successfully.

4 years ago

i mean they are the same...

4 years ago

0 Base_Template_Keras_Simply.Py

Hi AgitatedDove14 , Just updated that flag, but the problem continues..
` agent.package_manager.system_site_packages = true
.....
Environment setup completed successfully

Starting Task Execution:

ClearML results page: files_server:
Traceback (most recent call last):
File "base_template_keras_simple.py", line 15, in <module>
import tensorflow as tf # noqa: F401
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/binding/import_bind.py", line 59, in __pat...

4 years ago

Show more results