AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi All! Question Around Resource Management Using

Oh that makes sense, This depends on how you setup the clearml k8s glue, (becuase the resource allocation is done by k8s) a good hack to limit the number of containers per GPU is to set a RAM limitation per pod, then k8s will know to limit the number of pods on the same GPU machine,
wdty?

2 years ago

0 Hi! I’M Running An Experiment As Follows:

So are you saying why do we need to install a specific pip version ?
You can "disable it" by selecting a very high version
pip_version: "<40"https://github.com/allegroai/clearml-agent/blob/077148be00ead21084d63a14bf89d13d049cf7db/docs/clearml.conf#L67

2 years ago

0 Hi! I Am Getting The Following Error On An Agent:

This is an odd error, could it be conda is not installed in the container (or in the Path) ?
Are you trying with the latest RC?

2 years ago

0 We Are Planning To Use A Data Versioning System, Because Now We Are Having A Lot Of Folders With Different Names Which Basically Contain The Same Data, Only With Small Changes. The Most Prominent Candidates Are Clearml Data And Dvc. Could You Tell Me What

Hi GreasyPenguin14

Could you tell me what the differences are and why we should use ClearML data?

The first difference is in the approach itself, DVC ties the data with the code (i.e. git repo), where we (ClearML - but not just us) actually think data should be abstracted from the Code-Base and become a standalone argument, allowing users to build/execute against different dataset/versions. ClearML Data becomes part of the workflow as it is visible from the UI including the abili...

2 years ago

0 Okay Another Question !! Okay So I Would Like To Edit Parameters Through The Ui And Run It. So This Is My Script

o i have to upload and run a script with its default value first (since I don't have an initial task id), then clone it, edit the configuration inside that newly cloned one, get the id of the clone, and pass this into my script as the task_id and run it from my machine?

Correct. You can also create it (from code), "Reset" it (right click in the UI) and then edit it.

Is there a way do this without running it on my machine?

check clearml-task it is a CLI that will create ...

2 years ago

0 If I Have A Task And A Dataset Is Being Created In A Task, How Can I Get A “Link” That This Dataset Is Created In This Task, Similar To How Model Has The Task Where It Came From

So you want to have two Tasks and connect the two ?
Maybe the best approach is to have th current_task. the parent of the Dataset Task ?
dataset._task.set_parent(Task.current_task())

3 years ago

0 Regarding The New Version 1.1.2, I Have Noticed Type Hints Are Now Included In The Script Generated By

I tried specifying helpers functions but it still gives the same error.

What's the error you are getting ?

2 years ago

0 Hi All, I Am Trying To Debug A Strange Issue With The Most Minimal Version Of A

And when exactly are you getting the "user aborted" message)?
How do you start the process (are you manually running it, or is it an agent, or maybe pycharm?)
Can you provide the full log ?

3 years ago

0 How Would I Go Downloading A Table That I Have Reported Using

WackyRabbit7 This is a json representation of the entire plot (basically how plotly sees it).
What you are after is:
full_json[0]['cells']['values']Which is a list of lists (row order) in the table

3 years ago

0 Sorry Folks Too Many Questions - If I Have A Project (And I Set The Output Uri In It While Creating, To A S3 Folder) How Can I Ensure That A Experiment (Task) That I Run On My Local Outputs The Model To The Uri?

Sounds good?

3 years ago

0 Hello! I'M Just Starting Out With Clearml, And I Seem To Be Having Some Sort Of Conflict Between

SmallDeer34
I think this is somehow related to the JIT compiler torch is using.
My suspicion is that JIT cannot be initialized after something happened (like a subprocess, or a thread).
I think we managed to get around it with 1.0.3rc1.
Can you verify ?

3 years ago

0 I Found The Following Config Parameter (Related To Clearml-Data I Guess?):

Can clearml-agent currently detect this?

Hmm you mean will agent clean it self up?

3 years ago

0 Hi! I Was Wondering Regarding This Issue:

task = Task.init(project_name='debug', task_name='test tqdm cr cl') print('start') for i in tqdm.tqdm(range(100), dynamic_ncols=True,): sleep(1) print('done')This code snippet works as expected (console will show the progress at the flush interval without values in between). What's the difference ?!

2 years ago

0 Hi

Yes, just set system_site_packages: true in your clearml.conf
https://github.com/allegroai/clearml-agent/blob/d9b9b4984bb8a83914d0ec6d53c86c68bb847ef8/docs/clearml.conf#L57

3 years ago

0 Hi! I Was Wondering Regarding This Issue:

Hmm let me check ...

2 years ago

0 Is There An Elegant Way To Download All Images Posted In “Debug_Samples” From The Trains Server?

TrickyRaccoon92
I guess elegant is the challenge 🙂
What exactly is the use case ?

3 years ago

0 Pytorch Lightning Question About Logging A Figure. I Have The Following Code:

The reason is because it is logged as an image, not a plot 🙂

3 years ago

0 So I'Ve Install Allegro On Kubernetes Using Helm, How To I Perform

SubstantialElk6 on the client side?

3 years ago

0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

Is ClearML combined with DataParallel or DistributedDataParallel officially supported / should that work without many adjustments?Yes it is suported, and should work
If so, would it be started via python ... or via torchrun ... ?Yes it should, hence the request for a code snippet to reproduce the issue you are experiencing
What about remote runs, how will they support the parallel execution?Supported, You should see in the "script entry" something like "-m -m torch.di...

one year ago

0 Is There A Way To Change The Smoothing Algorithm? I Would Expect Extreme Smoothing To Converge To The Global Average Of A Scalar Plot, Not To The Value Of The First Dot.

Interesting, do you think you could PR a "fixed" version ?
https://github.com/allegroai/clearml-web/blob/2b6aa6043c3f36e3349c6fe7235b77a3fddd[…]app/webapp-common/shared/single-graph/single-graph.component.ts

one year ago

0 Port Remapping Of The Webserver Is Not Supported (Documentation Only Mentions

Hi DefeatedCrab47
You should be able to change the Web server port , but API port (8008) cannot be changed. If you can login to the web app and create a project it means everything is okay. Notice that when you configure trains ( trains-init ) the port numbers are correct 🙂

4 years ago

0 Hi

ShallowCat10 Of course it is 🙂

4 years ago

0 I Am Creating Clearml Tasks Using Clearml.Task.Init, For Some Reason Clearml Started Reusing (And Overwriting) Old Tasks, Previously It Did Not Do So.

Hi @<1523703472304689152:profile|UpsetTurkey67>

I circumvented the problem by putting timestamp in task name, but I don't think this is necessary.

Just pass reuse_last_task_id=False to Task.init, it will never try to reuse them 🙂
None

one year ago

0 I Am Getting This Specific Message When Trying To Run Hyper Parameters Optimization (Running Remotely My Task). Does It Affect My Flow? Do I Have Something To Worry About?

Although I didn't understand why you mentioned

torch

in my case?

Just a guess 🙂 other frameworks do multi-process as well,

I would guess it relates to parallelization of Tasks execution of the

HyperParameterOptimizer

class?

Yes that might be it, it's basically by product of using python "Process" class for multiprocessing. we are working on a fix, not a trivial unfortunately

2 years ago

0 Can I Prevent

however, this will also turn off metrics

For the sake of future readers, let me clarify on this one, turning it off auto_connect_frameworks={'pytorch': False} only effects the auto logging of torch.save/load
(side note: the reason is pytorch does not have built in metric reporting, i.e. it is usually done manually and these days most probably with tensorboard, for example lightning / ignite will use tensorboard as default metric reporting),

3 years ago

0 I'M Running A Simple Experiment (One Training Task, Nothing Else) And I'M Getting A Puzzling Message. Any Help Deciphering That Is Appreciated. I'M Pasting Part Of The Warnings Below:

Hi WittyOwl57
I think what happens is it auto-logs the joblib load/save calls, these calls track models used/created by the code, and attach them to the model repository representing these model.
I'm assuming there are multiple load/save , and there are multiple model instances pointing to the same local file "file:///tmp/..." . The earning basically says it is re-registering existing models.
Make sense ?

3 years ago

0 Hi! Is There A Way To Run A Task Without Reporting To The Server? For Example If I Want To Debug A Script By Running It Locally Without It Appearing On The Server

Task.set_offline(True)
https://clear.ml/docs/latest/docs/references/sdk/task#taskset_offline

3 years ago

0 Can Anyone Recommend Some Good Ai Deployment Frameworks For Kubernetes? (Better If They Have/Can Be Integrated With Clearml)

So good news (1) Dashboard is being worked on as we speak. (2) we released clearml-serving doing exactly that, the next release of clearml-serving will include integration with kfserving (under the hood) essentially managing the serving endpoints on top of the k8s cluster , wdyt?

3 years ago

0 Can Anyone Recommend Some Good Ai Deployment Frameworks For Kubernetes? (Better If They Have/Can Be Integrated With Clearml)

BTW: if you feel like pushing forward with integration I'll be more than happy to help PRing new capabilities, even before the "official" release

3 years ago

0 Can I Import A Tensorboard File Straight To The Clearml Ui?

Sadly no 😞
(I mean you could quickly write a reader for TB and report it, but it is not built into the SDK)

3 years ago

Show more results