AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

And maybe adding idle time spent without a job to API is not that a bad idea 😉
yes, adding that to the feature list 🙂

What if I write the last active state in an instance tag? This could be a solution…

I love this hack, yes this should just work.
BTW: if you lambda is a for loop that is constantly checking there is no need to actually store "last idle timestamp check as tag", no?

3 years ago

0 Hi, Is There A Means To Leverage On Clearml To Run A Ml Inference Container That Does Not Terminate?

How can I make a task that does a helm install or kubectl create deployment.yaml?

The task that it launches should have your code that actually does the helm deployments and other things, thing of the Task as a way to launch a script that does something, that script can then just interact with the cluster. The queue itself (i.e. clearml-agent) will not directly deploy helm charts, it will only deploy jobs (i.e. pods)

7 months ago

0 Hi All, I Am Trying To Spin Up Some Aws Autoscaler Instances, But I Seem To Have Some Issues With The Instance Creation:

Any recommendation or working combinations of AMI

I would take the deeplearning AMIs from Nvidia AWS , I think they work on both CPU and GPU machines.
In terms of dockers, python dockers for CPU and nvidia runtime for GPU
[https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86d[…]d2c01646d599352f6ddd9893420eb815a06c3b90619f8?context=explore](https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86db7f6b1b286d2c01646d599352f6ddd98...

2 years ago

0 Hi All, I Am Trying To Spin Up Some Aws Autoscaler Instances, But I Seem To Have Some Issues With The Instance Creation:

None

2 years ago

0 Hi All, I Am Trying To Spin Up Some Aws Autoscaler Instances, But I Seem To Have Some Issues With The Instance Creation:

@<1539780258050347008:profile|CheerfulKoala77> make sure the AMI id matches the zone of the EC2 machine

2 years ago

0 Hi Guys, I Have Been Running The Clearml-Serving For A While Now And I Realize That From Time To Time After A Couple Of Hours The Serving Task (Control Plane) That Is Configured Through The Cli Goes Into Status Abort. This Happens Even Though All The Pods

@<1569858449813016576:profile|JumpyRaven4> fyi clearml-serving was synced 🤞

one year ago

0 Hi! Regarding The

Hi GrievingTurkey78
the artifacts are downloaded to the cache folder (and by default the last 100 accessed artifacts are maintained there).

node executes the task all the info will be erased or does this have to be done explicitly?

Are you referring to the trains-agent running a docker?
By default the cache is persistent between execution (i.e. saving time on multiple downloads between experiments)

4 years ago

0 Hi All! I Have A Question About Pipelines. My Pipeline Consists Of Several Steps:

because step can be constructed with multiple

sub-components

but not all of them might be added to the UI graph

Just to make sure I fully understand when we decorate with @sub_node we want that to also appear in the UI graph (and have it's own Task / metrics etc)
correct?

2 years ago

0 Hi Again. As I Am Running My Experiment From Server Using Agent, I Am Failing On The Point, Where The Arguments Of Argparse Are Processed. When Is The Agent Task Registered. I Am Getting None For Task.Current_Task() At The Begining Of My Script.

or shall I call the Task.init even from the agent

WorriedParrot51 I think something is lost here.
Task.init() is always called, even when the agent is executing the code. The difference is in what happens inside the Task.init() call. When the codebase itself is executed by the trains-agent, it signals through OS environment to the task.init() that instead of a new created task, it should use the already created one. from this point all data flows from the trains-server back into the c...

5 years ago

0 Hello Everyone, I Am Using Self Hosted Clearml Server On Ec2 (Clearml Community Amis). This Ec2 Instance Is Attached To S3 With Iam Role. Now If I Create Or Upload Data From Client Side , I Want It To Be Uploaded On S3. There Is A Way Mentioned For Mentio

can I mount the s3 bucket as file system on place where

you need to mount it where the file server is storing it's files, correct (notice, not the DBs, just the files server)

one year ago

0 Hi Everyone! We Are Trying To Run Pipelines From Gitlab Ci Runners, But Are Faced With The Following Error When Performing

PreciousParrot26 I think this is really a matter of the CI process having very limited resources. just to be clear, you are correct and the steps them selves are Not executed inside the CI environment, but it seems that even running the pipeline logic is somehow "too much" for the limited resources... Make sense ?

3 years ago

0 Hi, I Am Using Logger.Report_Plotly() To Get My Roc_Curves In The Plot Window. But When Using The Comparing Feature Of Clearml, I Would Like The Plots With The Same Figure Title To Overlap. Is There A Way To Do This ?

Hi BrightGoat74
So merging general purpose plotly plots is very hard (i.e. putting both on the same graph)
But if you report using logger.report_scatter(...) the UI will merge the ROC curves into the dame graph, wdyt?
https://clear.ml/docs/latest/docs/guides/reporting/scatter_hist_confusion_mat_reporting#2d-scatter-plots

3 years ago

0 Hi, I'M Trying To Clone And Queue Experiments For Running Them On My Workers. I Am Able To Successfully Clone And Queue The Task, But Seems Like The Task Does Not Pass The Correct Parameters To My Python Script On The Worker. We Use Hydra For Configuring

I'm getting:
hydra_core == 1.1.1What's the setup you have? python version, OS, Conda yes/no?

3 years ago

0 Encountered An Odd Bug. Upon Attempting To Write Images To Clearml (3D Projected, Matplotlib),

If this is the case, then we do not change the maptplotlib backend
Also

I've attempted converting the

mpl

image to

PIL

and use

report_image

to push the image, to no avail.

What are you getting? error / exception ?

4 years ago

0 Thread Re: Pipelines And How They'Re Meant To Be Used / How Long They Take To Orchestrate.

pipe.start_locally() will run the DAG compute part on the same machine, where pipe.start() will start it on a remote worker (if it is not already running on a remote worker)
basically "pipe.start()" executed via an agent, will start the compute (no overhead)
does that help?

one year ago

0 Hi All, I Was Trying To Use Clearml-Task To Run A Custom Docker(With Poetry To Install All The Python Dependencies And Activated The Environment) Using Clearml Gpu, But It Seems Like Clearml Always Create A Virtual Environment And Run The Python Script Fr

well I do not think you set your pytorch lightining to use cuda:

GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/code/.venv/lib/python3.9/site-packages/lightning/pytorch/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.

2 years ago

0 Hi, Trying To Understand Clearml-Session. I Have An Agent Running On A Machine Monitoring A Queue Then I Ran Clearml-Session --Queue Myqueu --Docker Torch-Image. The Clearml Session Ended Up Tunneling Into The Physical Machine That My Agent Is Running

Dynamic GPU option only available with Enterprise version right?

Correct 🙂

4 years ago

0 Hello, I Would Like To Optimize Hparams Saved In Configuration Objects. I Used Hydra And Omegaconf For Hparams Definition (See Img). How Should I Define The Name Of Hparam In

I figured out the problem...

Nice!

Unfortunately, the hyperparameters in configuration object seems to be superior to the hyperparameters in Hyperparameter section

Hmm what do you mean by that ? how did you construct the code itself? (you should be able to "prioritize" one over the over)

3 years ago

0 Hello! I Get The Following Error In Results->Console After A Task Is Sent For Remote Execution (Using Sdk):

Yey?

3 years ago

0 Why Am I Getting A 403 From File Server When The K8 Glue Agent Is Initializing ?

And these ?
https://github.com/allegroai/clearml-helm-charts/blob/19a6785a03b780c2d22da1e79bcd69ac9ffcd839/charts/clearml-agent/values.yaml#L48

3 years ago

0 I’M Trying To Use

LazyTurkey38 , ohh I think you are correct 😞
it should be:
# patch the Task and actually send it for execution if Task.running_locally(): # this will verify all auto repo detection and python is done. task.close() # so that we can edit the task task.reset() # update the repo task.update_task(task_data={'script': {'branch': 'new_branch', 'repository': 'new_repo'}}) # now to actually enqueue the Task Task.enqueue(task, queue_name='default')wdyt?

4 years ago

0 Hi All, Looking For Some Help When Executing Pipelines With Custom Docker Images. I Have A Component Defined And I Expect Its Python Runtime Environment To Be Managed By A Custom Docker Image (

Could it be these packages (i.e. numpy etc) are not installed as system packages in the docker (i.e. inside a venv, inside the docker) ?

3 years ago

0 Hey, I Was Wondering How Can I Do Hparams Tuning With Trains? Couldn'T Find Anything On The Documentation

What should have happened is the experiments should have been pending (i.e. in a queue)
(Not sure why they are not).
You can manually send them for execution , right click on an experiment in the able, select enqueue and select the default queue (This will be the one the trains-agent will pull from , by default)

4 years ago

0 I Am Using Opennmt-Tf (2.18.1) And Clearml (1.1.2) For Training And Testing My Translation Models. I Am Wanting To Register The Incremental Bleu Scores And Final Test Data With Clearml (For Plotting, Comparison, Etc.), But It Is Not Working. I Cannot Fi

No TB (Tesnorboard) is not enabled.

That explains it 🙂 did you manage to get it working ?

3 years ago

0 Hi Guys, Until Today I Always Requested Data Scientists To Use Cli To Create Tasks. After That I Usually Reconfigure Them So They Can Be Pointed On Git Repo And So On. Unfortunately This Is Becoming A Big Task Since Now We Have Pipelines With Many Tasks A

Hmm good point, it should probably return he clearml python version. Is this what you mean?

4 years ago

0 , This Is A Great Tool For Visualizing All Your Experiments. I Wanted To Know That When I Am Logging Scalar Plots With Title As Train Loss And Test Loss They Are Getting Diplayed As Train Loss And Test Loss In The Scalar Tab. I Wanted That The Title Shoul

logger.report_scalar("loss", "train", iteration=0, value=100)
logger.report_scalar("loss", "test", iteration=0, value=200)

5 years ago

0 Hi There,

No worries, I'm just glad you managed to figure the source of the issue 🙂

2 years ago

0 Hi! In "Parallel Coordinates" View, Is There An Option To "Tilt" The Strings A Bit? It'S Currently Impossible To Understand Anything When There Are Multiple Hyperparameters Viewed And Some Have More Then Super Short Strings. Example Of How It Can Look (Se

Thanks GorgeousMole24
That is a very good point! passing to product guys

2 years ago

0 I Have A Problem With Clearml-Agent, The Agent Is Cloning Repository, But When Executing This Command:

🤔

3 years ago

0 Is There Any Simple Way To Orchestrate A Batch To Train A Model With Different Features (In Order To Do Feature Selection, For Example) Through A Single .Py File? I Saw The Following Example

Correct, but do notice that (1) task names are not unique and you can change them after the Task was executed (2) when you clone the Task, you can actually rename it, when an agent is running the Task, basically the init function is ignored, because the Task already exists. Make sense ?

3 years ago

Show more results