AgitatedDove14

49 Questions, 8060 Answers

Active since 10 January 2023

Last activity 9 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8060

0 Hi, I Am Having Trouble With Comparing Plotly Plots From Different Experiments. The Plots, When You Look At Them Within One Experiment Look Fine (Attaching Screenshot), However Once You Try To Compare Plots From Two Experiments There Are Few Problems:

Hi @<1566596960691949568:profile|UpsetWalrus59>
Could it be the two experiments have the exact name ?
(I sounds like a bug in the UI, but I'm trying to make sure, and also understand how to reproduce)
What's your clearml-server version ?

one year ago

0 Hi, Some Properties Of The Task Object Are Not Listed In The Documentation (Such As Task.Parent, Which Is Not Clear Whether It Is The Parent Task Object Itself Or The Id Of The Parent Task).

JitteryCoyote63 I meant to store the parent ID as another "hyper-parameter" (under its own section name) not the data itself.
Makes sense ?

4 years ago

0 Hi, In My Setup I Run Multiple Experiments In Parallel From The Same Script. I Understand That There Can Only Be One Execution

Well that depends on how you think about the automation. If you are running your experiments manually (i.e. you specifically call/execute them), then at the beginning of each experiment (or function) call Task.init and when you are done call Task.close . This can be done in parallel if you are running them from separate processes.
If you want to automate the process, you can start using the trains-agent which could help you spin those experiments on as many machines as you l...

4 years ago

0 Hi Guys, I’M Trying To Install It My Lab Server, But When I Try To Create Credentials, It Says Error And Gives More Info: Error 301 : Invalid User Id: Id=F46262Bde88B4928997351A657901D8B, Company=D1Bd92A3B039400Cbafc60A7A5B1E52B

So that means your home folder is always mapped to ~/ on any machine you ssh to ?

4 years ago

0 Hello Folks! I Have A Pipeline With Three Tasks: A, B, And C I Want To Set It Up So That: A Gets Assigned A Machine (E.G. Based On The Queue) B Always Gets Assigned To The Same Machine As A (But May Run In A Different Docker Etc.) C Will Be Submitted To

C will be submitted to a different queue and I don’t care as much

Is there a way to define “task affinity” in this way?

Hi RoughTiger69 ,
when you say Task affinity, you mean, I want C to be executed next to A/B ? Affinity as a concept doesn't really exist, it can be abstracted to a queue, where you have agents pulling from multiple queues. Then C can be pushed to one the the queues (in theory you might be able to programmtically control the Queue of C), wdyt?

2 years ago

0 Hello Again, How Can I Use The

Hi JumpyDragonfly13 , just making sure, do you have an agent running on a remote machine ?
Can you have a direct TCP connection to the remote machine (the default port it will use is 10022)

3 years ago

0 Is Clearml-Serving Using Either System Or Cuca Shared Memory? Or Planning To? In Our Experiments Using Perf_Analyzer The Shared Memory Experiments Showed A Huge Improvement And If We Wanted To Look Into This, Do You Have Any Pointers Of Where We Can Do T

Hi @<1547028116780617728:profile|TimelyRabbit96>
Notice that if running with docker compose you can pass an argument to the clearml triton container an use shared mem. You can do the same with the helm chart

10 months ago

0 Colors Of Cm Reporting Are Strange... Is It Possible To Adjust The Default Ones

PompousParrot44 these are the default plotly colors. You can change any of the layout properties with the
https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/trains/logger.py#L600

https://plotly.com/javascript/colorscales/

4 years ago

0 Hi, There! After Running A Task In Clearml, How Can I Use The Api To Get The Best Performing Output Model For That Task? I Came Across This

well that depends on you, what did you write there to know it is the best one ? file name ? added some metric ?

one year ago

0 Hey Guys. I Tried Running The Pytorch Mnist Example On A Train-Agent By Running It Locally And Then Resetting The Experiment And Then Enqueue-Ing It To The Default Queue. All Works Well But It Seems The Environment Building Process Gets Stuck On A Manual

Hi ColossalAnt7 , I think we run into it on a few dockers, I believe the bug was fixed in the latest trains-agent RC. Could you verify please ?

4 years ago

0 Any Idea Why Only A Single Instance Of Mujoco Can Be Run With Clearml-Agent? I Run 2 Clearm-Agents, One Per Gpu On My Workstation. However, The Second Task Failes With One Of The Following Errors:

Let me check ....

3 years ago

0 Hi All. I Am Struggling With Integrating Plots Into My Task. Without The Plotting Code, The Task Never Completes The Execution And Seems To Hang. Also, The Plots Are Not Visible In The Plots Tab. I Am Running A For Loop For Different Models And Attemptin

(with matplotlib 3.2+ I get no warning, let me check with 3.1)

3 years ago

0 Another Question: How Can I Make Clearml-Agent Use Pre-Installed Version From The Nvidia/Pytorch (

some dependencies will sometimes require different pip versions.

none 🙂 maybe setuptools, but not pip version
(pip is just a utility to install packages, it will not be a dependency of one)

2 years ago

0 Hi, I'M Having Problems With The Installed Packages When Creating An Experiment. The Installed Packages Used To Be A List With The Versions Of All The Installed Packages In The Venv. However, Now I Get The Following:

Thank you, I would love to make sure we fix it

3 years ago

0 Hey, I Have Many Python Files. In The First Python File I Use The Following Line. Parameters = Task.Connect(Input) Now I Change The Hyperparameters On The Graphical Interface. But Now I Need The Hyperparameters In Every Python File. How Do I Have Access T

Hi UnsightlySeagull42

But now I need the hyperparameters in every python file.

You can always get the Task from anywhere?
main_task = Task.current_task()

3 years ago

0 Hi Folks, I Did A Deployment Of Clearml Using The K8S Helm Chart, And I Set The Agent Using K8S Glue. I Run A Task Locally, And I Went To The Ui Cloned The Experiment And Scheduled It In The Default Queue. After Doing This, I See That The Experiment Is Q

is no agent listening to the "k8s_scheduler"

There should not be one, this is purely "virtual" , so users understand the k8s cluster is spinning their pod (sometimes it takes time, imagine EKS etc. , just visibility)

unfortunately I can't get info from the cluster

You should be able the pod in the cluster no?!
What's the Task Info panel say, can you share a screen shot ?

2 years ago

0 Hi There,

Ok no it only helps if as far as I don't log the figure.

you mean if you create the natplotlib figure and no automagic connect you still see the mem leak ?

one year ago

0 Hi, Together With

JitteryCoyote63 fix pushed to master, let me know if it passes...

4 years ago

0 Is It Possible To Increase The Polling Interval For K8S Glue? Currently It Is 5 Seconds I Believe. Would Adding An Argument For It Help? Can Do A Pr If So

Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output:

Run with --debug as the first parameter
Are you running the latest from the git repo ?

3 years ago

0 If The Trains-Server Stops Responding, Would Any Running Experiment Keep A Cache Of To-Be-Sent-Data, Fail The Experiment, Or Continue The Run, Skipping The Recordings Until The Server Is Back Up?

Hi TrickyRaccoon92

... would any running experiment keep a cache of to-be-sent-data, fail the experiment, or continue the run, skipping the recordings until the server is back up?

Basically they will keep trying to send data to server until it is up again (you should not loose any of the logs)

Are there any clever functionality for dumping experiment data to external storage to avoid filling up the server?

You mean artifacts or the database ?

4 years ago

0 Hey, Great Product! I'Ve Installed Trains Agent On A Python3 Venv, But When I Run A Script On The Worker, It Calls Python2 Instead Of Python 3. How To Change It?

Hi VivaciousWalrus99
Could you attach the log of the run ?
By default it will use the python it is running with.
Any chance the original experiment was executed with python2 ?

4 years ago

0 Hey Guys, I'M Experiencing Seemingly Random Problems With The Experiments. There Are 4 Gpus And 8 Workers (2 Workers Per Gpu) , And Sometimes Experiments Randomly Fail (Or Complete) In The Middle Of The Epoch Without Any Additional Info In The Logs. What

Hi DilapidatedDucks58 ,
Just making sure all 8 works have different worker ids? (you can see 8 in the workers page in the UI)
Also, are they running this docker or venv mode?

4 years ago

0 Hi, I Have A Question Regarding The Aws_Autoscaler: It Usually Takes ~Hours To Get A Gpu Instance Nowadays. I Was Thinking, It Would Be Much More Interesting To Stop The Instances (Clearml-Agents) Instead Of Terminating Them Once They Are Inactive, So Tha

instead of terminating them once they are inactive, so that they could be available immediately when they are needed.

JitteryCoyote63 I think you can increase the IDLE timeout on the autoscaler, and achive the same behavior, no ?

2 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

Okay let me check if I can test on this git version.

3 years ago

0 Hi Everyone, Just Setup Trains.. Was Very Easy To Setup. Was Able To Run An Experiment With It. Question: Is It Possible To Turn Off The Code Tracking (Anything Related To Git) ?

4 years ago

0 Hi People! I Think The Clearml

🙏

one year ago

0 Is There A Way To Interface With Clearml Agent (Cli?) To Handle Model Repositories And Data Versioning (But So, Not Experimentation, Tight Integration, Pipelining, Etc)?

UnevenDolphin73 FYI: clearml-data is documented , unfortunately only in GitHub:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md

3 years ago

0 Trains[Azure] Install - Azure Dependencies Not Latest. Trains Depends On Older Version Of Azure Python Sdk. My Project Already Has Dependency On The Latest Version. How Can This Be Resolved? Installing Collected Packages: Azure-Storage-Common, Azure-Stor

trains[azure] give you the possibility to do the following:
from trains import StorageManager my_local_cached_file = StorageManager.get_local_copy('azure://bucket/folder/file.bin')This means you do not have to manually download stuff/ and maintain the cache local cache, the StorageManager will do that for you.
If you do no need that ability, no need to install the trains[azure] you can just install trains
Unfortunately, we haven't had the time to upgrade to the Azure storage v...

4 years ago

0 Hi! I Was Wondering Regarding This Issue:

Thanks WittyOwl57 ! let me check

3 years ago

0 Hi All—First Off, Thanks For Being Such A Helpful And Thorough Group Of People. I Learn A Ton Just Searching Through The Channel For Problems. I’M Seeing A Weird Issue. I Have A Conda Env On My Linux Machine, And I Can Successfully Run A Training Script

that must have been it. here’s the installed packages when not using

-m

:

Hmm yes, can you open a GitHub issue on that? (this seems like a bug)

3 years ago

Show more results