AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 So I'M In A Colab Notebook, And After Running My Trainer(), How Do I Upload My Test Metrics To Clearml? Clearml Caught These Metrics And Uploaded Them:

(second cell)

4 years ago

0 So I'M In A Colab Notebook, And After Running My Trainer(), How Do I Upload My Test Metrics To Clearml? Clearml Caught These Metrics And Uploaded Them:

Hi SmallDeer34
Did you call Task.init ?

4 years ago

0 Can Someone Help Me With Deploying This Example Model (From Triton Inference Server) Deployed In Clearml-Serving? Too Many Random Errors For Me To Figure It Out

Was wondering how it can handle 10s, 100s of models.

Yes, it supports dynamically loading/unloading models based on requests
(load balancing multiple nodes is disconnected from it, but assuming they are under diff endpoints, the load balancer can be configured to route accordingly)

4 years ago

0 Hi! I Noticed A Bug Related To Reusing The Same Component In A Pipeline. I Have Prepared A Mock Example So That You Can Reproduce It:

GiganticTurtle0 your timing is great, the plan is to wrap-up efforts and release early next week (I'm assuming GitHub fixes will be pushed tomorrow I'll post here once they are there)

3 years ago

0 Is There A Way To Record Metrics With The Rest Api? I'M Not Seeing It.

Hi MammothGoat53
Do you mean working with RestAPI directly?
https://clear.ml/docs/latest/docs/references/api/events

2 years ago

0 Hello, I Am Looking For A Way To Increase Number Of Images Saved In Results>Debug Samples. Looks Like There Is A Limit Of 100 Images Per Experiment, And All Images Saved After Are Not Displayed In Web Client. I Like To Have First Batch With Predictions V

you can also increase the limit here:
https://github.com/allegroai/clearml/blob/2e95881c76119964944eaa0289549617e8afeee9/docs/clearml.conf#L32

4 years ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

So in theory you can clone yourself 2 extra times and push into an execution queue, but the issue might be actually making sure the resources are available. what did you have in mind?

4 years ago

0 Hi, I'Ve Got A Quick Question About

Where is the cleamlr-server running? GCP as well?

3 years ago

0 Hey All, Is There Any Reason The Python Sdk

It only happens in the clearml environment, works fine local.

Hi BoredHedgehog47
what do you mean by "in the clearml environment" ?

2 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

based on this one:
https://stackoverflow.com/questions/31436407/git-ls-remote-returns-fatal-no-remote-configured-to-list-refs-from
I think this is a specific issue of the local git repo configuration, can you verify
(btw: I tested with git 2.17.1 git ls-remote --get-url will return the remote url, without an error)

4 years ago

0 Hi Folks, I Have A Question Related To The Storage Of Artifacts, As It Is Not Entirely Clear To Me Where To Configure It. If I Read The Documentation

when I run it on my laptop...

Then yes, you need to set the default_output_uri on Your laptop's clearml.conf (just like you set it on the k8s glue)
Make sense ?

3 years ago

0 Hi Guys, We Are Running Clearml-Serving On A Kube Cluster On Aws And We Have Noticed That We Are Getting Some 502 Errors Once In A While That We Can'T Seem To Trace Back.

Hmm reading this: None
How are you checking the health of the serving pod ?

one year ago

0 Hello, I'M Trying To Save A Keras Model As A Task Artifact, And Then Upload It From Another Task. Does Anyone Know The Syntax For That? What I'Ve Seen Is Not Quite Working.

So I have a task that just loads a model, but I don't see it as an artifact in the UI

You should see it under Artifacts, Input model if you are calling Keras load function (or similar)

4 years ago

0 Hey, Can You Give An Example Of Api Post Request For Tasks.Clone Or Tasks.Edit_Hyper_Params I’Ve Tried:

BTW: any specific reason for going the RestAPI way and not using the python SDK ?

3 years ago

0 Hi, I Run 'Manually' On My Local Machine With No Errors. Then, I Clone The Completed Task And Enqueue It. I Get To Stage When 'Environment Setup Completed Successfully'. But Right After I Get An Error Related To 'Connect' Method - Task.Connect(Config.Mode

Hi @<1571308003204796416:profile|HollowPeacock58>
could you share the full log ?

2 years ago

0 Question About The Storage Manager. Assuming I Have An Object That Updates Frequently And Always Saved At The Same Path (E.G.

WackyRabbit7
Long story short, yes, only by name (hashing might be too slow on large files)
The easiest solution, if the hash is incorrect, delete the local copy it returns, and ask again, it will download it.
I'm not sure if the hashing is exposed, but if it is not, we can add it.
What do you think?

4 years ago

0 Getting This Error At

Any idea why the Pipeline Controller is Running despite the task passing?

What do you mean by "the task passing"

4 years ago

0 I'M A Little Confused As To How Force_Requirements_Env_Freeze Works When No Requirements File Is Supplied. Is It Supposed To Store The Full Reqs Of The Environment That Calls It?

If you have a requirements file then you can specify it:
Task.force_requirements_env_freeze(requirements_file='requirements.txt')
If you just want pip freeze output to be shown in your "Installed Packages" section then use:
Task.force_requirements_env_freeze()
Notice that in both cases you should call the function Before you call Task.init()
btw, what do you mean by "Packages will be installed from projects requirements file" ?

3 years ago

0 Hey All. I Need Some Help Debugging Some Errors. I Keep Getting An Error About Failing To Clone The Repository On The Remote Instance. What Could Be The Reason Of This? Are There Any Common Errors Related To This? I Suspect Permissions, But Not Entirely

Hi @<1687643893996195840:profile|RoundCat60> , I just saw the message,

Just by chance I set the SSH deploy keys to write access and now we're able to clone the repo. Why would the SSH key need write access to the repo to be able to clone?

Let me explain, the default use case for the agent is to use user/pass (as configured in the clearml.conf file(
It will change any ssh links to https links and will add the credentials to clone the repository.
You can also provide SSH keys (basicall...

4 years ago

0 Does Clearml Have A Testing Api? I'M Setting Up Stack To Enque Work With Clearml. Is There A Way I Can Simulate Queue And Worker Execution?

Ohh if this is the case, you might also consider using offline mode, so there is no need for backend
https://clear.ml/docs/latest/docs/guides/set_offline#setting-task-to-offline-mode

2 years ago

0 Question About The Storage Manager. Assuming I Have An Object That Updates Frequently And Always Saved At The Same Path (E.G.

But adding a simple

force_download

flag to the

get_local_copy

That's sounds like a good idea

4 years ago

0 Hi, What Is The Right Way Of Syncing A Dataset? Whenever I Add New Archives And Try To Upload I Get:

Correct 🙂

4 years ago

0 Hi Again. Is There Any Way To Have Trains-Agent Do A 'Docker Build' On The Dockerfile In The Repository It Pulls And Then Run That Image? I Know I Can Specify The Base Image Trains-Agent Runs The Task In And That Will Get Pulled/Run At Execution Time, But

trains-agent runs a container from that image, then clones ...

That is correct

I'd like the base_docker_image to not only be defined at runtime

I see, may I ask why not just build it once, push it into artifactory and then have trains-agent use it? (it will be much faster)

4 years ago

0 Hi, I Am New Here, Can I Ask Question On Trains-Server Also?

Hi CooperativeFox72 ,
From the backend guys, long story short, upgrade your machine => more cpu cores , more processes , it is that easy 🙂

5 years ago

0 Hi, There Is Small Bug In The Web Ui When Comparing Two Experiments Scalars: If The Two Tasks Have The Same Name, Then Clicking On The “Maximize Graph” Button On One Scalar Series To Get The Bigger View On That Scalar Series, Then The Color Of Both Series

Thanks, yes you are correct the color is derived from the series name, so I guess the issue is the name+Id is not kept in full screen

3 years ago

0 After I Have Create A Task And Closed It In A Notebook, Any Activity Seems To Trigger Another Task. For Example:

How can I ensure that additional tasks aren’t created for a notebook unless I really want to?

TrickySheep9 are you saying two Tasks are created in the same notebook without you closing one of them ?
(Also, how is the git diff warning there with the latest clearml, I think there was some fix related to that)

4 years ago

0 Hi, I Am Saving Plt Chart To Clearml Using

Yes I think the writer.add_figure somehow crops the image

4 years ago

I'm going to follow your suggestion and just put the extra effort into distributing a pre-built image.

That sounds good 🙂
If you feel the need is important, I do have a hack in mind, it will be doable once we have support for entrypoint "-c python_code_here". But since this is still not available I believe you are right and build an image would be the easiest.

A note on the docker image, remember that when running inside the docker we inherit the system packages installed on the d...

4 years ago

0 Hi, Does Anyone Know Where Trains Stores Tensorboard Data? Because I Am Used To Using Tensorboard To Record Experimental Data And Store Data, I Hope I Can Access The Folder Where Tensorboard Stores Data When I Use Command Like

Hi FierceFly22

Hi, does anyone know where trains stores tensorboard data

Tesnorboard data is stored wherever you point your file-writer to 🙂
What trains is doing is while tensorboard writes it's own data to disk, it takes the data (in-flight) and sends it to the trains-server. The trains-server puts everything in the DB, so later everything is viewable & searchable.
Basically you don't need to store your TB files after your experiment is done, you have all the data in the trains-s...

5 years ago

0 Hi, I'M Getting Error 404 When Trying To See Debug Samples I'Ve Recorded With Record_Image. The Local Path I'Ve Provided Is Valid (Image Is Displayed Normally When I Read It Via Python For Example) But Trains Ui Tell Me In The Debug Samples "Unable To Loa

SmarmySeaurchin8 checks the logs, maybe you can find something there

4 years ago

Show more results