RoughTiger69

28 Questions, 101 Answers

Active since 10 January 2023

Last activity 8 months ago

Reputation

Badges 1

90 × Eureka!

Questions 28
Answers 101

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hi, I Am Trying To Use The Aws Autoscaler To Assign Instance Profiles To New Machines. This Is A Better Way Than Managing Credentials. I Added The Configuration To The Autoscaler Config Like So:

Hi, I am trying to use the aws autoscaler to assign instance profiles to new machines. This is a better way than managing credentials. I added the configurat...

aws mlops

2 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Regarding The “Classic” Datasets (Not Hyper Datasets): Is There An Option To Do Something Equivalent To Dvc’S “

Regarding the “classic” datasets (not hyper datasets): Is there an option to do something equivalent to dvc’s “ https://dvc.org/doc/user-guide/managing-exter...

clearml

2 years ago

0 Votes

14 Answers

1K Views

0 Votes 14 Answers 1K Views

Two Simple Lineage Related Questions:

Two simple lineage related questions: Task B is a clone of Taks A. Does B store the information that it was cloned from A somewhere? Training task X loads Da...

clearml

3 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi, I Am Using Pipelinedecorator To Create Tasks. Is There A Way To Force It To Use The Entire Git Repo It Is Created From On The Pythonpath? Vs. Just The Decorated Function And Perhaps The Helper_Function=[Some_Function]?

Hi, I am using PipelineDecorator to create tasks. is there a way to force it to use the entire git repo it is created from on the pythonpath? vs. just the de...

clearml

2 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Cannot Upload A Dataset With A Parent - Seems Very Odd! Clearml Versions I Tried: 1.6.1, 1.6.2 Scenario: * Create Parent Dataset (With Storage On S3) * Upload Data * Close Dataset * Create Child Dataset (Tried With Storage On Both S3 Or On Clearml Serv

Cannot upload a dataset with a parent - seems very odd! clearml versions I tried: 1.6.1, 1.6.2 scenario: * Create parent dataset (with storage on S3) * Uploa...

dataset

2 years ago

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

I Have A Pipeline With Tasks A->B->C. I Want To Be Able To Trigger It Manually, And Skip A Regardless Of It’S Cache Status. I Want To Pass B Value That Represents A’S Output If Needed. What’S A Good Way To Achieve This (Can Be Ui-Based, Or Pipeline-Gymnas

I have a pipeline with tasks A->B->C. I want to be able to trigger it manually, and skip A regardless of it’s cache status. I want to pass B value that repre...

clearml

2 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Avoiding

Avoiding http://Clear.ml glue code spaghetti - community best practices? Say I have training pipeline : Task 1 - data preprocessing -> create a dataset artif...

clearml

3 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Fyi I Am Getting A Lot Of Read Timeouts From The Community Server: Retrying (Retry(Total=235, Connect=240, Read=235, Redirect=240, Status=240)) After Connection Broken By ’Readtimeouterror(“Httpsconnectionpool(Host=‘

FYI I am getting a lot of read timeouts from the community server: Retrying (Retry(total=235, connect=240, read=235, redirect=240, status=240)) after connect...

clearml

2 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

I Have A Situation Where I’D Like To “Promote” The Pipeline (And Dataset) By Creating It In A Completely Separate Instance Of Clearml Server Which Is Used For Production Retraining (Vs. The Dev. Clearml Server That Is Used For Experiments) A) Is This Some

I have a situation where I’d like to “promote” the pipeline (and dataset) by creating it in a completely separate instance of clearml server which is used fo...

clearml

3 years ago

0 Votes

0 Answers

980 Views

0 Votes 0 Answers 980 Views

Autoscaler 101 Questions:

autoscaler 101 questions: What is the best practice for managing credentials so that they don’t get saved in clearml webapp? When the https://clear.ml/docs/l...

mlops

2 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hi Folks, Is There A Way To Force Clear-Ml Agent With --Docker To

hi folks, is there a way to force clear-ml agent with --docker to not create a virtualenv at all? And perhaps not even attempt to install requirements even? ...

mlops

2 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hi, I Have A Task Which Uses Hydra For Configuration. I Want To Add This Taks To A Pipeline, And Pass The Full Hydra Config Objects To The Task. Is There A Way To Do It? I Get “Parameters Should Be In The Form Of “`Section-Name`/Parameter”, Example: “Args

hi, I have a task which uses hydra for configuration. I want to add this taks to a pipeline, and pass the full hydra config objects to the task. is there a w...

clearml

3 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Is There A Way To Tell The Agent To Run In Docker In A Way That Just “Trusts” The Installation Completely? For Example, I Have A Local Environment With Python 3.8.10 And I Am Trying To Run A Task In A Docker With Python 3.8.13 I’Ve Pointed The Notorious

Is there a way to tell the agent to run in docker in a way that just “trusts” the installation completely? for example, I have a local environment with pytho...

mlops

2 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

I Have Code That Does Torch.Load(Path) And Deserializes A Model. I Am Performing This In Package A.B.C, And The Model’S Module Is Available In In A.B.C.Model Unfortunately, The Model Was Serialized With A Different Module Structure - It Was Originally Pla

I have code that does torch.load(path) and deserializes a model. I am performing this in package a.b.c, and the model’s module is available in in a.b.c.model...

clearml

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hi, I Am Catching Up With

Hi, I am catching up with http://clear.ml for stuff beyond exp. tracking, and have a few questions. Will ask them separately to allow threading:

clearml

3 years ago

0 Votes

18 Answers

1K Views

0 Votes 18 Answers 1K Views

2. Is There A Case-Study Or Ref. Architecture For Interacting With Ci/Cd I.E. Exposing Mature Pipelines To Be Triggered Upon Code Pushes (Taking Latest Git Hash) Or With Manual Ci Triggers?

Is there a case-study or ref. architecture for interacting with CI/CD i.e. exposing mature pipelines to be triggered upon code pushes (taking latest git hash...

clearml

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi, I Created A Dataset With 20K Files, Total Of 20Gb, With Storage Pointing To S3. When I Upload (Or Close) The Dataset, During The Compression Phase, The Clear-Ml Cli Is Killed Due To Oom.

hi, I created a dataset with 20K files, total of 20GB, with storage pointing to S3. When I upload (or close) the dataset, during the compression phase, the c...

clearml

2 years ago

0 Votes

11 Answers

1K Views

0 Votes 11 Answers 1K Views

I Have A Local Folder A, And A Dataset B. A:

I have a local folder a, and a dataset B. a: a a/.DS_Store a/1.txt a/b a/b/.DS_Store a/b/1.txt a/b/c a/b/c/1.txtDataset B: b b/2.txt b/c b/c/2.txtI want to “...

clearml

2 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hi, I Am Running A Pipeline From Decorators. The Pipeline Runs Fine. Then I Try To Clone It By Clicking The (Successful) Run And Launching. The Pipeline Fails Immediately With The Error

hi, I am running a pipeline from decorators. the pipeline runs fine. Then I try to clone it by clicking the (successful) run and launching. The pipeline fail...

clearml

2 years ago

0 Votes

14 Answers

1K Views

0 Votes 14 Answers 1K Views

Question About Pipeline And Long-Waiting Tasks: Say I Want To Generate A Dataset. The Workflow I Have Requires

question about pipeline and long-waiting tasks: Say I want to generate a dataset. The workflow I have requires query to a DB Creating a labeling assigment in...

clearml

2 years ago

0 Votes

12 Answers

498 Views

0 Votes 12 Answers 498 Views

Is there a reference implmentation for a task in a pipeline that awaits user input?

clearml

3 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hello Folks! I Have A Pipeline With Three Tasks: A, B, And C I Want To Set It Up So That: A Gets Assigned A Machine (E.G. Based On The Queue) B Always Gets Assigned To The Same Machine As A (But May Run In A Different Docker Etc.) C Will Be Submitted To

Hello folks! I have a pipeline with three tasks: A, B, and C I want to set it up so that: A gets assigned a machine (e.g. based on the queue) B always gets a...

clearml

2 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

I Am Trying To Run A Task That Is Completely Detached From Git - Remotely. The Script Uploads Fine But In The Ui, The Git Repo Appears As “Origin”. When The Agent Tries To Pick This Up, It Fails On Trying To Clone “Origin”. What Can I Do To Let The Agent

I am trying to run a task that is completely detached from git - remotely. The script uploads fine but in the UI, the git repo appears as “origin”. When the ...

mlops

2 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi, When Running A Training Script From Pycharm, It Seems That Clearml Logs Only Those Packages That Are Explicitly Imported By My .Py Files; It Seems To Not Take The Pacakges That Are In The Requirements.Txt My Training Uses Keras

hi, When running a training script from pycharm, it seems that clearml logs only those packages that are explicitly imported by my .py files; it seems to not...

mlops

3 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Autoscaler From Saas (Pro Version). I Attempted To Use The Autoscaler “Application” From Clearml Ui. Here Is What I Get In The Logs Of The Autoscaler Screen Itself (Consistent):

autoscaler from saas (pro version). I attempted to use the autoscaler “application” from clearml UI. here is what I get in the logs of the autoscaler screen ...

mlops

2 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

I Have A Logical Task That I Want To Split To Multiple Workers. The Task Involves Processing Media Files (Not Training). The Optimal Design For Me Would Be:

I have a logical task that I want to split to multiple workers. The task involves processing media files (not training). The optimal design for me would be: ...

clearml

2 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

What’S A Good Ami To Use For The Clearml Autoscaler On Aws? The Defaults Offered Confidently By The Various Auto Scaler Installers Don’T Seem To Exist…| E.G.

what’s a good ami to use for the clearml autoscaler on AWS? the defaults offered confidently by the various auto scaler installers don’t seem to exist…| e.g....

mlops

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Did More Digging, Seems That You Need To Start The Agent With Clearml_Agent_Skip_Pip_Venv_Install=1

Did more digging, seems that you need to start the agent with CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=1

mlops

2 years ago

0 I Am Trying To Run A Task That Is Completely Detached From Git - Remotely. The Script Uploads Fine But In The Ui, The Git Repo Appears As “Origin”. When The Agent Tries To Pick This Up, It Fails On Trying To Clone “Origin”. What Can I Do To Let The Agent

CostlyOstrich36 If I delete the origin and all other info and set it to tag_name=‘xxx’ then it is able to work

2 years ago

0 Regarding The “Classic” Datasets (Not Hyper Datasets): Is There An Option To Do Something Equivalent To Dvc’S “

AgitatedDove14 nope… you can run md5 on the file as stored in the remote storage (nfs or s3)

2 years ago

0 I Have Code That Does Torch.Load(Path) And Deserializes A Model. I Am Performing This In Package A.B.C, And The Model’S Module Is Available In In A.B.C.Model Unfortunately, The Model Was Serialized With A Different Module Structure - It Was Originally Pla

I think it has something to do with clearml since I can run this code as pure python without clearml, and when I activate clearml, I see that torch.load() hits the

import_bind

.

__patched_import3

when trying to deserialize the saved model

2 years ago

0 Cannot Upload A Dataset With A Parent - Seems Very Odd! Clearml Versions I Tried: 1.6.1, 1.6.2 Scenario: * Create Parent Dataset (With Storage On S3) * Upload Data * Close Dataset * Create Child Dataset (Tried With Storage On Both S3 Or On Clearml Serv

I tested it again with much smaller data and it seems to work.
I am not sure what is the difference between the use-cases. it seems like something specifically about the particular (big) parent doesn’t agree with clearml…

2 years ago

0 Question About Pipeline And Long-Waiting Tasks: Say I Want To Generate A Dataset. The Workflow I Have Requires

AgitatedDove14 thanks, good idea.

My main issue with this approach is that it breaks the workflow into “a-sync” set of tasks:

One task sends a list of images for labeling and terminates an external webhook calls http://clear.ml and creates a dataset from the labels returned from the labeling task a trigger wakes up the label post processing/splitting logic.
It will be hard to understand where things are standing from looking at the UI.

I was wondering if the “waiting” operator can actua...

2 years ago

0 I Have A Logical Task That I Want To Split To Multiple Workers. The Task Involves Processing Media Files (Not Training). The Optimal Design For Me Would Be:

AgitatedDove14 from what I gather there is a lightly documented concept of “multi_instance_support” https://github.com/allegroai/clearml/blob/90854fa4a516fcb38ea0a5ec23894c5a3b6bbc4f/clearml/automation/controller.py#L3296 .
Do you think it can work?

2 years ago

0 Hi, I Am Trying To Use The Aws Autoscaler To Assign Instance Profiles To New Machines. This Is A Better Way Than Managing Credentials. I Added The Configuration To The Autoscaler Config Like So:

Trust me, I had to add this field to this default dict just so that clearml doesn’t delete it for me
it does appear on the task in the UI, just somehow not repopulated in the remote run if it’s not a part of the default empty dict…

2 years ago

0 I Have A Logical Task That I Want To Split To Multiple Workers. The Task Involves Processing Media Files (Not Training). The Optimal Design For Me Would Be:

AgitatedDove14 it’s pretty much similar to your proposal but with pipelines instead of tasks, right?

2 years ago

0 Question About Pipeline And Long-Waiting Tasks: Say I Want To Generate A Dataset. The Workflow I Have Requires

AgitatedDove14 1.1.5.
Yes - first locally, then it aborts (while running locally presumably).
then I re-enqueue it via the UI and it seems to run on the agent

2 years ago

I will try and get back to this area of the code soon

2 years ago

0 Hello Folks! I Have A Pipeline With Three Tasks: A, B, And C I Want To Set It Up So That: A Gets Assigned A Machine (E.G. Based On The Queue) B Always Gets Assigned To The Same Machine As A (But May Run In A Different Docker Etc.) C Will Be Submitted To

AgitatedDove14 much obliged!

2 years ago

0 Hi, I Am Running A Pipeline From Decorators. The Pipeline Runs Fine. Then I Try To Clone It By Clicking The (Successful) Run And Launching. The Pipeline Fails Immediately With The Error

CostlyOstrich36 I’ve tried the pipeline_from_decorator.py example and it works.
Could it be a sensitivity to some components being on a different python .py file relative to the controller itself?

2 years ago

Tried with 1.6.0, doesn’t work

#this is the parent clearml-data create --project xxx --name yyy --output-uri `
clearml-data add folder1
clearml-data close

#this is the child, where XYZ is the parent's id
clearml-data create --project xxx --name yyy1 --parents XYZ --output-uri
clearml-data add folder2
clearml-data close
#now I get the error above `

2 years ago

It seems to work fine when the parent is on clear.ml storage (tried with toy example of data)

2 years ago

no, I tried either with very small files or with 20GB as the parent

2 years ago

0 I Have A Local Folder A, And A Dataset B. A:

AgitatedDove14 mv command requires empty folders… so moving b in to a won’t work if some subfolders are already there

2 years ago

python 3.8
I’ve worked around the issue by doing:
sys.modules['model'] = local_model_package

2 years ago

0 Hi, I Am Using Pipelinedecorator To Create Tasks. Is There A Way To Force It To Use The Entire Git Repo It Is Created From On The Pythonpath? Vs. Just The Decorated Function And Perhaps The Helper_Function=[Some_Function]?

AgitatedDove14

the root git path should be part of your PYTHONPATH automatically

That’s true but it doesn’t respect the root package (sources root or whatever).
i.e. if all my packages are runder /path/to/git/root /src/
So I had to add it explicitly via a docker init script…

2 years ago

0 Hi Folks, Is There A Way To Force Clear-Ml Agent With --Docker To

AgitatedDove14 yes, i am passing this flag to the agent with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 clearml-agent….
running inside docker
and it still tries to install the requirements.txt

Using 1.3.1

2 years ago

0 Hi, I Am Running A Pipeline From Decorators. The Pipeline Runs Fine. Then I Try To Clone It By Clicking The (Successful) Run And Launching. The Pipeline Fails Immediately With The Error

CostlyOstrich36 all tasks are remote.
conrtoller - tried both

2 years ago

0 Hi, I Am Running A Pipeline From Decorators. The Pipeline Runs Fine. Then I Try To Clone It By Clicking The (Successful) Run And Launching. The Pipeline Fails Immediately With The Error

CostlyOstrich36 I confirm this was the case.

So :

module_a.py @PipelineDecorator.pipeline()... from module_b import my_func x = my_func()
` modele_b.py

@PipelineDecorator.component()
def my_func()
pass `

Under this circumstances, the pipeline is created correctly and run correctly
But when I clone it (or click “Run” and submit) - it fails with the error above.

Moving my_func from module_a to module_b solves this.

To me this looks like a bug or unreasonable and undocumented...

2 years ago

0 Hi, I Am Trying To Setup Multi-Node Training With Pytorch Distributeddataparallel. Ddp Requres A Launch Script With A Set Of Parameters To Be Run On Each Node. One Of These Parameters Is Master Node Address. I Am Currently Using The Following Scheme:

@ https://app.slack.com/team/UT8T0V3NE is there a non-free version support for the feature of preempting lower priority tasks to allow a higher priority task to come in?

2 years ago

0 Hi, I Am Trying To Use The Aws Autoscaler To Assign Instance Profiles To New Machines. This Is A Better Way Than Managing Credentials. I Added The Configuration To The Autoscaler Config Like So:

But you already have all the entries defined here:

yes but it’s missing a field that is actually found and parsed from my local autoscaler.yaml….

2 years ago

0 Question About Pipeline And Long-Waiting Tasks: Say I Want To Generate A Dataset. The Workflow I Have Requires

AgitatedDove14 I see the continue_pipeline f flag.
I want to resume the same instance of the pipeline.
When I want to resume the pipeilne, I can only re-enqueue it - I cannot reset parameters (right?)

So it seems that for the pipeline to resume with the “continue pipeline” mode,
I need to pass the “continue_pipeline” first time I submit the pipeline.
Hopefully it will be ignored during the first run and just behave like a new run, and only really kick in when the pipeline is resumed....

2 years ago

0 Hello Everyone! I’Ve Installed Clearml On My Kubernetes Cluster Using The Helm Chart. I Then Proceeded To Clone An Example Experiment (3D Plot Reporting) And Executed It, Expecting A K8S Job To Be Run, But Instead I Noticed That The Clearml-Agent Containe

SmugHippopotamus96 how did this setup work for you? are you using an autoscaling node group for the jobs?
with or without GPU?
Any additional tips on usage?

2 years ago

0 Avoiding

I mean that there will be no task created, and no invocation of any http://clear.ml API whatsoever including no imports in the “core ML task” This is the direction - add very small wrappers of http://clear.ml code around the core ML task. The http://clear.ml wrapper is “aware’ of the core ML code, and never the other way. For cases where the wrapper is only “before” and “after” the core ML task, its somewhat easier to achieve. For reporting artifacts etc. which is “mid flow” - it’s m...

3 years ago

0 Hi Everyone, We Train Our Ml Models Using The Aws Autoscaler On G4Dn Instances. We Currently Have A 24 Vcpu Limit For G Type Instances In Eu-West. I'M Trying To Get This Limit At Least Doubled Or Tripled. My Request Keeps Stagnating With The Service Team

AgitatedDove14 can you share if there is a plan to put the gcp autoscaler in the open source?

2 years ago

0 Hi, I Have A Task Which Uses Hydra For Configuration. I Want To Add This Taks To A Pipeline, And Pass The Full Hydra Config Objects To The Task. Is There A Way To Do It? I Get “Parameters Should Be In The Form Of “`Section-Name`/Parameter”, Example: “Args

I think it works.
small correction - use slash and not dot in configuration/OmegaConf:
parameter_override={'configuration/OmegaConf': dict...')})

3 years ago

and for the record - to override hydra params the syntax is :
parameter_override={'Hydra/x.y':1234}where x.y=1234 is how you would override the param via the cli

3 years ago

I want to pass the entire hydra omegaconf as a (nested) dictionary

3 years ago

Show more results