AgitatedDove14

49 Questions, 8054 Answers

Active since 10 January 2023

Last activity 9 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8054

0 I Seem To Be Missing Something ... I'Ve Only Got One Task Running To Train A Segmentation Model On My Local Machine, And In A Few Days It'S Hit Over 1.15M Api Calls. It Looks Like It'S Sending Every Single Console Output ... Are There Settings To Control

is number of calls performed, not what those calls were.

oh, yes this is just a measure of how many API calls are sent.
It does not really matter which ones

one year ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

😞 DilapidatedDucks58 how exactly are you "relaunching/continue" the execution? And what exactly are you setting?

3 years ago

0 What’S The Easiest Way To Update The Repo Url Alone For A Task? Need - In My Ci, The Url Used Is Https But I Need The Ssh Url To Be Used. I See That We Can Pass Repo To Task.Create But Not Task.Init

or do you mean agent can convert https url to ssh??

Yep it does that automatically if you set: force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/docs/clearml.conf#L25

3 years ago

0 Hi, Is It Intented Behavior That Models That Are Saved By A Clearml-Agent Will Have The Clearml-Agents User (So The User Of Which Generated The Api Credentials For The Agent) In The "User" Field Of The Model Instead Of The User Who Started The Task?

👍

2 years ago

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

There seems to be a problem with multiprocessing: Although I stopped the task,

You mean you "aborted the task" from the UI?

There is a memory leak somewhere, please see the screenshot of datadog memory consumptionI'm assuming from the leftover processes ?

Python 3.8/Pytorch 1.11/clearml-sdk 1.9.0/clearml-agent 1.4.1

From the log I see the agent is running in venv mode
Hmm please try with the latest clearml-agent (the others should not have any effect)

one year ago

0 Any Specific Reason For Modelling Experiments As Separate Tasks Rather Than A Single Entity With Multiple Runs?

Yes, experiments are standalone as they do not have to have any connecting thread.
When would you say a new "run" vs a new "experiment" ? when you change a parameter ? change data ? change code ?
If you want to "bucket them" use projects 🙂 it is probably the easiest now that we have support for nested projects.

3 years ago

0 Has Anyone Successfully Deployed Clearml On A Kube Cluster Utilizing Istio? I Don’T See Any Mention Of Istio In The Docs.

i’m working on creating a custom config with istio

That is awesome! let me know if we could help 🙂
Also please consider PRing it, I'm sure other users will appreciate the option

3 years ago

0 Encountered An Odd Bug. Upon Attempting To Write Images To Clearml (3D Projected, Matplotlib),

t seems there is some async behavior going on. After ending a run, this prompt just hangs for a long time:

2021-04-18 22:55:06,467 - clearml.Task - INFO - Waiting to finish uploads

And there's no sign of updates on the dashboard

Hmm that could point to an issue uploading the last images (which are larger than regular scalars) could you try flushing and waiting ?
i.e.
task.flush() sleep(45)

3 years ago

0 I Am Trying Pytorch Nightly Again With Python 3.10. Works Fine Locally, But Fails On Clearml-Agent In Docker Mode.

So was the issue solved?

one year ago

0 Hi All, I Am Testing The New

named as

venv_update

(I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?

This is deprecated... it was a test to use the a package that can update pip venvs, but it was never stable, we will remove it in the next version

Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable an

output_uri

parameter in the

PipelineDecorator.componen...

3 years ago

0 Quick Question, Can Trains Log Keras Loss Values And/Or Metrics Automatically? Or Would I Have To Attach A Tensorboard Callback?

ElegantCoyote26 I don't think Keras logs it anywhere unless you have TB, so nowhere to take the data from...
In short, yes you have to have TB :)

4 years ago

0 Hi Team, Me Again! Im Curious If Someone Can Explain To Me Better How Task And Optimisers Integrate With Each Other. In The Example Hyperparameter Optimisation, There Is Both A Task Initialised With

I see now, give me a minute I'll check

4 years ago

0 Has Anyone Had Success Using Clearml With Huggingface Models? I Create My Hf

LOL I hear you 🙂

one year ago

0 Hi. Looking Into Clearml Support For Datasets, I'D Like To Understand How To Work With Large Datasets And Cases Where Not All The Data Is Downloaded At Once. (E.G. 1. Each Training Epoch Is Performed On A (Preferably Random) Sample Of The Data That Is Dow

PanickyMoth78

Is it limited to

accounts? (

unfortunately, yes 😊 , but I'm sure sales will be able to hook you up ...

2 years ago

0 Hi All—First Off, Thanks For Being Such A Helpful And Thorough Group Of People. I Learn A Ton Just Searching Through The Channel For Problems. I’M Seeing A Weird Issue. I Have A Conda Env On My Linux Machine, And I Can Successfully Run A Training Script

(torchvision vs. cuda compatibility, will work on that),

The agent will pull the correct torch based on the cuda version that is available at runtime (or configured via the clearml.conf)

3 years ago

0 Fatal: Could Not Read From Remote Repository. Please Make Sure You Have The Correct Access Rights And The Repository Exists.

I don't think so. it is solved by installing openssh-client to the docker image or by adding deploy token to the cloning url in web ui

You can also have the token (token==password) configured as the defauylt user/pass in your agent's clearml.conf
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L19

3 years ago

BTW: could it be the Task.init is Not called on the "module.name" entry point, but somewhere internally ?

3 years ago

0 Hi, We Are Having An Interesting Issue Here. We Serve Many Users And Each User Has Their Own Credentials In Accessing The Private Git Repo. We Can'T Seem To Find A Way For The End User To Pass In Their Git Credentials When They Run Their Codes In Both Age

SubstantialElk6 (2) yes definitely will be fixed
Regrading (1), what do you mean by "via the code" ? Do you mean like as a Task docker cmd ?

3 years ago

0 Cloning: Origin Repository Cloning Failed: 'Nonetype' Object Has No Attribute 'Startswith' Trains_Agent: Error: Failed Cloning Repository. 1) Make Sure You Pushed The Requested Commit: (Repository='Origin', Branch='Master', Commit_Id='051A8418Cf1D85F392

MysteriousBee56 there is no way to tell the trains-agent to pull from local copy of your repository...
You might be able to hack it, if you copy the entire local repo to the trains-agent version control cache. would that help you?

4 years ago

0 Hi, I Have Another Problem

That depends on what you have installed 🙂

4 years ago

0 Crazy Idea:

I see, good point. It does look like mostly boiler plate code, not sure where it actually runs the python command, but I'm sure it is there (python.ts, but could not locate who is actually using it)

one year ago

0 Hi, I Have A Question Regarding The Aws_Autoscaler: It Usually Takes ~Hours To Get A Gpu Instance Nowadays. I Was Thinking, It Would Be Much More Interesting To Stop The Instances (Clearml-Agents) Instead Of Terminating Them Once They Are Inactive, So Tha

instead of terminating them once they are inactive, so that they could be available immediately when they are needed.

JitteryCoyote63 I think you can increase the IDLE timeout on the autoscaler, and achive the same behavior, no ?

2 years ago

0 Hi Everyone! I Have A Short Question That You Can For Sure Help Me With. Is There A Way To Avoid Each Task To Create A New Environment? I'D Like To Specify Which Env To Use. I Tried With

Then this is by default the free space on the home folder (`~/.clearml') that is missing free space

2 years ago

0 Hi, I Have Another Problem

(since you are using venv mode, if the cuda is not detected at startup time, it will not install the GPU version, as it has no CUDA support)

4 years ago

0 Hi, I'Ve Recently Upgraded To 0.15.1 From 0.14.2, And For Some Reason A Code That Previously Worked In Which I'M Getting The Tags Of A Model Using

PompousBeetle71 notice that starting with this version when you set model tags they will be stored as user tags , which you can change and edit in UI. So if you still need the system tags you have to access them directly.

4 years ago

0 Running Into A Strange Issue—

Seems correct.
I'm assuming something is wrong with the key/secret quoting ?!
Could you generate another one and test it ?
(you can have multiple key/secretes on the same user)

3 years ago

0 Hi! I Have Local Minio Setup, Via Minio Browser I Can Upload 50-100 Mb Per Second As Its Local. But When I Try To Use Task.Upload_Artifact It Uploads 500 Kb Per Second. Does Anyone Have An Idea About This?

None of them is problematic, this is what I'm trying to say 🙂
I think the minio browser gets confused.
if you want to test the upload time on the client you can try:
task.flush(wait_for_uploads=True) tic = time() task.upload_artifact('test', '/tmp/localfile') task.flush(wait_for_uploads=True) print(time() - tic)

4 years ago

0 Hi! I Have A Question Regarding Performances Of The Clearml-Server: Are The Calls From The Agents Made Asynchronously/In A Non Blocking Separate Thread? Is The Connection To The Clearml-Server Expected To Be A Bottleneck If The Clearml-Server Is Far From

potential sources of slow down in the training code

Is there one?

3 years ago

0 What Could Be The Reason For My Package To Not Be Loading Under The "Installed Packages"? I Have A

So if everything works you should see "my_package" package in the "installed packages"
the assumption is that if you do:
pip install "my_package"
It will set "pandas" as one of its dependencies, and pip will automatically pull pandas as well.
That way we do not list the entire venv you are running on, just the packages/versions you are using, and we let pip sort the dependencies when installing with the agent
Make sense ?

3 years ago

0 Hi, Can You Help Me Pls, I Got: Environment Setup Completed Successfully Starting Task Execution: Traceback (Most Recent Call Last): File "Agro_Api.Py", Line 13, In From Help_Models.Consts Import Urls Importerror: No Module Named 'Help_Models'

help_models is a dir in the git

And the git is registered on the experiment correctly ?

4 years ago

Show more results