AgitatedDove14

49 Questions, 8060 Answers

Active since 10 January 2023

Last activity 9 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8060

0 Hi, Does 'Trains' Save Scalars Somewhere In The Machine Similar To Log File?

MysteriousBee56 what do you mean "save Scalars on the machine"? All metrics are sent to the trains server. You can later retrieve them from code, if you need.

4 years ago

0 Hi, I Have A Worker On A Machine Using Gpus 0,1 And Another Worker On The Same Machine Using Gpus 0,1,2,3,4,5. A Worker Ran A Task On Gpus 0,1 But For Some Reason The Second Worker Started Additional Task In Queue On Gpus 0,1,2,3,4,5, Which Caused Both Of

BTW: you still can get race/starvation cases... But at least no crash

4 years ago

0 Hi All, I Was Trying To Use Clearml-Task To Run A Custom Docker(With Poetry To Install All The Python Dependencies And Activated The Environment) Using Clearml Gpu, But It Seems Like Clearml Always Create A Virtual Environment And Run The Python Script Fr

That's the right place but
like you would use hydra --override, which in your case I think it should be "accelerator.gpu" ,

You can also change allow_omegaconf_editin the UI to True, and then you could just edit the OmegaConf in the UI (if you do not changeallow_omegaconf_edit` then the edit in the UI is ignored)

one year ago

0 Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

I have to admit mounting it to a different drive is a good reason to bring this feature back, the reasoning was it means the agent needs to make sure it manages them (e.g. multiple agents running on the same machine)

3 years ago

0 Is It Possible To Avoid The Clearml-Agent For Local Installations, And Have The File Server Automatically Use An S3 Bucket? I'Ve Found

hmm that is odd, let me check

3 years ago

0 When It Comes To Continuous Training, I Wanted To Know How You Train Or Would Train If You Have Annotated Data Incoming? Do You Train Completely Online Where You Train As Soon As You Have A Training Example Available? Do You Instead Train When You Have A

Sorry for pinging you on this old thread.
...
And what was the learning strategy? ADAM? RMSProp?

Sorry, missed it...
I would actually use the HPO to test various setups (it uses Optuna under the hood so really SOTA hyper band Bayesian optimization ontop of them)
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py

3 years ago

0 Hi, I'Ve Just Started To Evaluate Clearml For Internal Use At My Org And Am Wondering If There'S Anyway To Import Data From Old Experiments Into The Dashboard. Anyone Have Any Thoughts On This?

If I have access to the logs, python env and git commits, is there an API to log those to the experiments too?

Sure:
task.update_task see here:
https://clear.ml/docs/latest/docs/references/sdk/task#update_task
example:
task.update_task(task_data={'script': {'branch': 'new_branch', 'repository': 'new_repo'}})The easiest way to get all the different sections (they should be relatively self explanatory) is calling task.export_task() which returns a dict with all the fields yo...

2 years ago

0 Hey Everyone

this

from fastai.callbacks.tensorboard import LearnerTensorboardWriter

doesn’t exist anymore in fastai2

Hmm we should definitely update the example to fastai2 API

maybe the fastai bindings in clearml package are outdated

Are you getting any scalars reported to clearml?

they also appear to be relying on the tensorboard callback which seems not to work on distributed training

Yes that is correct, usually the way it works all nodes report back to "master...

3 years ago

0 What Could Be The Reason For My Package To Not Be Loading Under The "Installed Packages"? I Have A

So the "packages" are the packages you need in the steps themselves ?

3 years ago

0 Is It Possible To Embed A Streamlit App In A Clearml Report? Are There Other Ways To Integrate Streamlit Apps?

Can the host server's service agent be used?

In theory yes, just make sure you expose the containers network (check the docker compose)

one year ago

0 I'M Using

Hmm, I really like this one:
https://chart-studio.plotly.com/~empet/14632/plotly-joyplotridgelines/#plot
What I'm thinking is a global setting basically telling the TB binding layer to always do ridgeline instead of 3d surface.
wdyt?

3 years ago

0 Hi All, I'M Using Clearml 1.0.3 With Clearml-Server <1 (How Do I Get The Current Running Version?) In Pytorch-Lightning I Use Ddp And I See Multiple Tasks (As The Number Of Gpus) Being Created And Remaining In Draft Mode. Is It A Problem Running Clearml

Hi ExcitedFish86

In Pytorch-Lightning I use DDP

I think a fix for pytorch multi-node / process distribution was commited to 1.0.4rc1, could you verify it solves the issue ? (rc1 should fix this specific issue)
BTW: no problem working with cleaml-server < 1

3 years ago

0 Hey, Is It Possible To Use Clearml-Init (Configuration File Creation ) Under Root Permissions? I Get :

Hi EmbarrassedSpider34
clearml-init will try to create ~/clearml.conf I'm assuming that when you execute under root it is resolved to /root/clearml.conf That said you might be able to override it with:
CLEARML_CONFIG_FILE=$HOME/clearml.con sudo clearml-init

3 years ago

0 Hey, Could You Help Me? I’Ve Tried Update Clearml-Server In K8S Old And New Clearml In The Different Namespaces, But After Migrate I Got The Error Error 101 : Inconsistent Data Encountered In Document: Document=Output, Field=Model How It Fix?

HI ResponsiveCamel97
What's the clearml-server version? How do you spin the server on your k8s cluster, helm ?

3 years ago

0 Hi, Plotting A Debug Sample With A

I'll make sure we look into it

4 years ago

0 Moreover, When I Go To The Queue Page, I See The Queue Is Empty, But When I'M On The Queued Task'S Page I Can See It Is Enqueued To Right Right Queue... So The Task Says It Is In The Queue, But The Queue Says It Is Empty

and the step is "queued" or is it "queued" in the pipeline state (i.e. the visualization did not update) ?

3 years ago

0 Hi I'M Trying To Run A Hyperparameter Tuning Experiment On A Privately Hosted Server And The Trials Are Forever Enqueued (Status: Pending) As Long As The Main Task Is Running But The Workers Are Never Utilised When The Trials Are Not Running. Is This Expe

Hi AverageBee39
Did you setup an agent to execute the actual Tasks ?

3 years ago

0 Hi, Is There A Way To Create A Draft Experiment Manually? That Is - Give It A Some File To Run, Or, Better Yet, A Function To Run Which Will Be The Start Of The Experiment? In W&B, For Example It Is Possible To Simply Write (Their

OddAlligator72 I like this idea.
The single thing I'm not sure about is the "function entry point"
Why would one do that? Meaning why wouldn't you have a proper python entry-point.
The reason I'm reluctant is that you might have calls/functions/variables in global scope of the file storing the function, and then users will not know why something broke, ans it will be very cumbersome to debug.
A simple script entry point seems trivial to launch and debug locally.
What do you think ? What woul...

4 years ago

0 Hi, I Am Trying To Run A Task In An Agent From A Repository With An

Try to set this line in your clearml.conf to true:
https://github.com/allegroai/clearml/blob/6e6271fb91f2aeb2aa7a13c6d07d4e635baaa670/docs/clearml.conf#L177

3 years ago

0 Can Someone Help Me With Deploying This Example Model (From Triton Inference Server) Deployed In Clearml-Serving? Too Many Random Errors For Me To Figure It Out

Should I use

update_weights_package

Yes
BTW, config.pbtxt you should pass when "registering" the endpoint with the CLI

3 years ago

0 I’M Trying To Use

I want to keep the above setup, the remote branch that will track my local will be on

fork

so it needs to pull from there. Currently it recognizes

origin

so it doesn’t work because the agent then can’t find the commit.

So you do not want to push the change set ?
You can basically add the entire change set (uncomitted changes) from the last pushed commit).
In your clearml.conf, set store_code_diff_from_remote: true
https://github.com/allegroai...

3 years ago

0 I Know I Can Run This Manually In Step By Step But Wondering If This Can Be Automated As Scheduled Tasks

DAG which get scheduled at given interval and

Yes exactly what will be part of the next iteration of the controller/service

an example achieving what i propose would be greatly helpful

Would this help?
from trains.automation import TrainsJob job = TrainsJob(base_task_id='step1_task_id_here') job.launch(queue_name='default') job.wait() job2 = TrainsJob(base_task_id='step2_task_id_here') job2.launch(queue_name='default') job2.wait()

4 years ago

0 Hi, I’M Trying Out Clearml Pipelines From Decorators, And I’M Encountering A Few Problems I Don’T Know How To Solve.

Hi DizzyPelican17
I’d like to configure requirements file, docker image, docker command for my pipeline controller, but it seems I cannot set it up. Am I missing something?The decorator itself accepts those as arguments:
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#pipelinedecoratorcomponent
https://github.com/allegroai/clearml/blob/90f30e8d9a5ca9a1afa6b2e5ffccb96b0afe9c78/examples/pipeline/pipeline_from_decorator.py#L8

I’d like to setup up...

2 years ago

0 Hi All, I'M Updating My Code To Use Hydra, And Facing An Issue: When I Try To Init A Task In Offline Mode I'Me Getting The Following:

Hmm let me check, I think we changed the offline mode to use the latest API version (because by definition it cannot know what's the server).
Let me check if you can override it

3 years ago

0 Hi All, I Have Deployed A Clearml Server With Docker To One Of Our Local Machine. I Had Set Up The Filesserver Folder As Mount Point To The Cloud. How Easy Is It To Migrate Our Existing Experiments Later On To A Clearml Server That We Deploy In The Cloud

Correct

one year ago

0 Hi People! I Think The Clearml

😞 I'll pass to the guys

one year ago

0 Hey, How Do I Use Local Files As Dependencies? I Have Several Tasks That I Want To Run In The Pipeline, So Optimally I Would Use Tasks From Functions. Can I Specify In Task What Local Files Do I Use That Should Be Packaged? Or Do I Have To Pack Everything

Hi @<1539055479878062080:profile|FranticLobster21>

hey, how do I use local files as dependencies?

You mean like a repository ?

Can I specify in task what local files do I use that should be packaged?

In a git repo?

Basically the agent can do two things, either replicate a single script or clone a git repo + uncommitted changes

one year ago

0 Getting This Error At

This is a Sagemaker notebook instances

Yes I think this is the issue

3 years ago

0 Hi Guys! How Do You Handle Tasks With A Complex Parametrization? For Example, A Script That Trains A Machine Learning Model, Where You Want To Parametrize Model Name, Hyperpars, Preprocessing Steps, Etc. So A Nested Configuration With Many Parameters Do I

Hi @<1691620877822595072:profile|FlutteringMouse14>

Do I have to use Hydra

You can, and then the entire configuration is fully captured by ClearML (automatically) while you can still override values with the manual "key.sub=value" both in the UI and in the CLI

Otherwise you can connect nested dict with task.connect (these will be flattened with / for sub keys).
Or you can connect configuration files ( task.connect_configuration ) and edit them as is in the UI (with override of...

9 months ago

0 Hello, I Am Looking For A Way To Increase Number Of Images Saved In Results>Debug Samples. Looks Like There Is A Limit Of 100 Images Per Experiment, And All Images Saved After Are Not Displayed In Web Client. I Like To Have First Batch With Predictions V

What is the difference to

file_history_size

Number of unique files per titles/series combination (aka how many images to store in the history, when the iteration is constantly increasing)

3 years ago

Show more results