AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi, I'M Using Clearml'S Hosted Free Saas Offering. I'M Running Model Training In Pytorch On A Server And Pushing Metrics To Cml. I'Ve Noticed That Anytime My Training Job Fails Due To Gpu Oom Issues, Cml Marks The Job As

Then we can figure out what can be changed so CML correctly registers process failures with Hydra

JumpyPig73 quick question, the state of the Task changes immediately when it crashes ? are you running it with an agent (that hydra triggers) ?

If this is vanilla clearml with Hydra runners, what I suspect happens is Hydra is overriding the signal callback hydra adds (like hydra clearml needs to figure out of the process crashed), then what happens is that clearml's callback is never cal...

2 years ago

0 Hello I'M New Here, I Found This Error When Testing My Tensorflow / Keras Model. I Already Create The Model Endpoint By Running Command 'Clearml-Serving --Id <Service_Id> Model Add --Engine Triton --Endpoint "Model_Name"... '. Also My Tensorflow / Keras M

MoodyCentipede68 can you post the full docker-compose log (from spinning it until you get the error?)
You can just pipe the output to a file with :
docker-compose ... up > log.txt

2 years ago

0 Hello, In The Following Context:

That said, you might have accessed the artifacts before any of them were registered

4 years ago

Well from the error it seems there is no layer called "dense" , hence triton failing to find the layer returning the reult. Does that make sense?

2 years ago

0 Good Evening! For Agent Work Please Tell Me If It Is Possible To Specify The Location Of Requirements.Txt In Tack.Init(), Because I Have Correctly Identified The Versions Of The Libraries Used In The Project

Hi CheerfulGorilla72
the "installed packages" section is used as "requirements.txt for the agent.
Are you saying the autodetection fails to detect all packages? You can specify in "manual execution" (i.e not when the agent is running the code), to just take the requirements.txt locally:` Task.force_requirements_env_freeze(requirements_file="./requirements.txt")

notice the above call should be executed Before Task.init

task = Task.init(...) `3. If you clear all the "installed packages" se...

2 years ago

0 Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

SarcasticSquirrel56

if I configure manually the pods for the different nodes, how do I make clearml server aware that those agents exist?

Basically the agent register themselves on your cleaml-server, and they register on which Queue(s) they listen to. In other words the interface to choose the different types of machines/gpus is by enqueue the Task to different queues.
For example: Queue(1): "CUDA11_GPUx1" , Queue(2): "CUDA10_GPUx1"
Make sense ?

EDIT:

I guess to achieve what I w...

2 years ago

0 Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

So that agent on different nodes will probably require different cuda-version images.

That makes sense SarcasticSquirrel56
I would edit the helm chart (or deploy manually) based on a selector that will select the different nodes/gpus and assign the correct containers (i.e. matching CUDA versions to the diff GPUs / drivers)
BTW: you can also playaround with k8s glue, which would dynamically spin pods based on clearml Tasks.
wdyt?

2 years ago

0 Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

Correct, (if this is running on k8s it is most likely be passed via env variables , CLEARML_WEB_HOST etc,)

2 years ago

0 Hello! I'M Trying To Make A Simple Eval.Py Script That Will Go Pull The Best Model Of A Given Experiment, Load It Locally And Evaluate It On Whatever Data I Give. Question 1: Is There A Standard Way Documented Somewhere To Do This? Question 2: I'M Loadin

Wait, that makes no sense to me. The API from python and the API from the UI are getting the same data from the backend ...
What are you getting with?
from clearml import Task task = Task.get_task(task_id=<put task id here>) print(task.models)

one year ago

0 Hi Again, I Tried To Upgrade Trains Package To 15.1 From 13.1 That I Was Using For A While.. After The Upgrade My Code Stuck When Trying To Use "Pool" (From Multiprocessing Import Pool) The Code Snip:

😞 CooperativeFox72 please see if you can send a code snippet to reproduce the issue. I'd be happy to solve the it ...

4 years ago

0 Hi New With Clearml I Create Clearml Server On Gcp With Docker Now I’M Training Yolov5 And I Want To Save All The Info (Model And Metrics ) With Clearml To My Bucket.. (So I Can Have Small Server And No Memory Issue ) Where Should I Start? Its Should Be C

Hi AstonishingRabbit13

now I’m training yolov5 and i want to save all the info (model and metrics ) with clearml to my bucket..

The easiest thing (assuming you are running YOLOv5 with python train.py is to add the following env variable:
CLEARML_DEFAULT_OUTPUT_URI=" " python train.pyNotice that you need to pass your GS credentials here:
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113

one year ago

0 When Running In

the use case i have is to allow people from my team to run their workloads on set of servers without stepping over each other..

So does that mean CPU only workloads?
Also are we afraid of fairness? (i.e. someone "taking" all the CPU for themselves)

4 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

okay the odd thing git ls-remote --get-url origin should have returned the same...
what's your git version? (git --version)

3 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

Oh i get it now, can you test:
git ls-remote --get-url githuband then
git ls-remote --get-url

3 years ago

0 Hi Everyone, Just Setup Trains.. Was Very Easy To Setup. Was Able To Run An Experiment With It. Question: Is It Possible To Turn Off The Code Tracking (Anything Related To Git) ?

4 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

Can you verify it fixes the timeout issue as well? (or some insight on how to reproduce the issue?)

3 years ago

0 Hi, I'M Facing Some Issues When Try To Run A Pipeline, How Can A Import A Local Library Using Pipelines From Functions? Always Getting "Modulenotfounderror: No Module Named"

Hi DepressedFox45
Basically move the import into the function, it will automatically detect the package.
@PipelineDecorator.component(...) def step_one(...): import sklearn import pandas as pd # stuffMake sense ?

2 years ago

0 Hi, I'M Facing Some Issues When Try To Run A Pipeline, How Can A Import A Local Library Using Pipelines From Functions? Always Getting "Modulenotfounderror: No Module Named"

you can also specify additional packages on the decorator
@PipelineDecorator.component(..., packages=["tqdm>=2.1", "scikit-learn"]) def step_one(...): # code here

2 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

So does that mean "origin" solves the issue ?

3 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

Thanks!

3 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

LOL, thanks!

3 years ago

0 What Is The Recommended Way To Stop The Execution Of A Specific Agent? This Command Doesn'T Allow Me To Specify The Agent Ip I Want To Stop:

Thanks!

3 years ago

0 Hi, I Would Like To Understand How I Can Set The Pip Cache Location For My Agent, I Thought That I Already Had The Right Setting With

What sort of data would be stored in the

venvs-build

folder?

ClumsyElephant70 temporary (lifetime of the task execution) virtual environment, including the code etc. It is deleted and recreated for every new task launched (or restored from cache, if venvs_cache is enabled)

2 years ago

0 Hi, I Run 'Manually' On My Local Machine With No Errors. Then, I Clone The Completed Task And Enqueue It. I Get To Stage When 'Environment Setup Completed Successfully'. But Right After I Get An Error Related To 'Connect' Method - Task.Connect(Config.Mode

@<1571308003204796416:profile|HollowPeacock58> seems like an internal issue copying this object config.model
This is a complex object, and it seems that for some reason
None

As a workaround just do not connect this object. it seems you cannot pickle it / copy it (see GH issue)

one year ago

0 Hi, I'M Trying To Get An Understanding Of How

Hi GiddyTurkey39 ,

When you say trains agent, are you referring to the trains agent command ...

I mean running the trains-agent daemon on a machine. This means you have a daemon pulling jobs from the execution queue and executing them (either in virtual environment, or inside a docker)
You can read more about https://github.com/allegroai/trains-agent and https://allegro.ai/docs/concepts_arch/concepts_arch/

Is it sufficient to queue the experiments

Yes there is no ne...

4 years ago

0 Hi. Inside A Notebook When I Cerate A New Clearml Task And Then Run Sklearn Gridsearchcv , Clearml Uploads A Lot Of Model. Is There A Way To Force Clearml Not To Upload These Models? Related Question Is What Are These Models Anyway? Their Name Only Contai

model upload and registration i should pass something like

'xgboost': False

or

'xgboost': False, 'scikit': False

?

Exactly! which framework are you using ?

about 2, I refer to the names of the models.

Hmm that is a good point to test, usually this is based on the Task name (I think), so if the Task name contains the HPO params in the name it should be the same on the model name. Do you see the HPO params on the Task name ? Should we open a Gi...

one year ago

0 , This Is A Great Tool For Visualizing All Your Experiments. I Wanted To Know That When I Am Logging Scalar Plots With Title As Train Loss And Test Loss They Are Getting Diplayed As Train Loss And Test Loss In The Scalar Tab. I Wanted That The Title Shoul

Before this line, call Task.init

4 years ago

0 Hi Again, I Am Trying To Execute A Pipeline Remotely, However I Am Running Into A Problem With The Steps That Require A Local Package. Basically I Have A Repo, That I Created Specifically For This Pipeline And I Have Packaged It So That I Can Split It I