JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 8 months ago

Reputation

Badges 1

979 × Eureka!

Answers 1021

0 Hi There,

Well no luck - using matplotlib.use('agg') in my training codebase doesn't solve the mem leak

one year ago

0 Hi, In The Clearml-Server Web-Ui, Under Debug Sample, Would It Be Possible To Improve The Logic For Fetching The Images? If I Have Say 200 Iteration, It Will The Last By Default. If I Want To See Iteration 50, I Will Have To Manually Click On The Arrow Un

Hi CostlyOstrich36 , most of the time I want to compare two experiments in the DEBUG SAMPLE, so if I click on one sample to enlarge it I cannot see the others. Also once I closed the panel, the iteration number is not updated

2 years ago

0 Hi There, It Seems Like There Is A Bug With The Visualization Of Debug Samples On The Ui (Server V1.2.0, Self-Hosted): When Clicking On A Debug Sample Then On The Download Button, If The Sample Is Stored In S3, The Download Button Opens A Blank Page With

Sure, it’s because of a very annoying bug that I shared in this https://clearml.slack.com/archives/CTK20V944/p1648647503942759 , that I couldn’t solve so far.

I’m not sure you can downgrade that easily ...

Yea that’s what I thought, that’s a bit of pain for me now, I hope I can find a way to fix the bug somehow

2 years ago

0 Hey There, Is It Possible For A Clearml Pipeline Step To Log A Folder Instead Of Numpy/Pickle Objects? Looking At The Docs,

In all the steps I want to store them as artifacts to s3 because it’s very convenient.
The last step should merge them all, ie. it needs to know all the other artifacts of the previous steps

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

and then call task.connect_configuration probably

2 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

Sorry, its actually
task.update_requirements(["."])

3 years ago

0 Hi, Some Properties Of The Task Object Are Not Listed In The Documentation (Such As Task.Parent, Which Is Not Clear Whether It Is The Parent Task Object Itself Or The Id Of The Parent Task).

GrumpyPenguin23 yes, it is the latest
AgitatedDove14 , what I was looking for was: parent_task = Task.get_task(task.parent)

4 years ago

0 Hey, I Moved My Trains-Server To Another Machine, Zipping The /Opt/Trains/Data Folder As Described In The Docs

Bottom line is: trains-server uses elastichsearch image: http://docker.elastic.co/elasticsearch/elasticsearch:5.6.16 which does not have an unlimited license (only free license that expires after some time). From versions 6.3, elasticsearch provides an unlimited free license. Trains should use >=6.3, WDYT?

4 years ago

0 Hi, I Have A Local Package That I Use To Train My Models. To Start Training, I Have A Script That Calls

Hi AgitatedDove14 , initially I was doing this, but then I realised that with the approach you suggest all the packages of the local environment also end up in the “installed packages”, while in reality I only need the dependencies of the local package. That’s why I use _update_requirements , with this approach only the package required will be installed in the agent

2 years ago

0 Hi, Just Want To Report A Small Bug In The Clearml Dashboard: After Queuing An Experiment, If I Change The Experiment Queue, Then Go Back To The Experiment Info Tab, The Queue Property Still Shows The Previous Queue

no, at least not in clearml-server version 1.1.1-135 • 1.1.1 • 2.14

3 years ago

0 Hi, I Am Using Clearml With Pytorch-Ignite And Its Earlystopping Handler. I Would Like To Log The Counter Of The Patience Of This Handler, How Can I Do That?

I didn’t use ignite callbacks, for future reference:
` early_stopping_handler = EarlyStopping(...)

def log_patience(_):
clearml_logger.report_scalar("patience", "early_stopping", early_stopping_handler.counter, engine.state.epoch)

engine.add_event_handler(Events.EPOCH_COMPLETED, early_stopping_handler)
engine.add_event_handler(Events.EPOCH_COMPLETED, log_patience) `

3 years ago

0 Hi, I Have A Long Running Experiment That Was Running On Aws Instance That Got Killed After ~4 Days With The Following Reason:

Thanks! I will investigate further, I am thinking that the AWS instance might have been stuck for an unknown reason (becoming unhealthy)

2 years ago

0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

yes, exactly 🙂

4 years ago

0 Hey There, Is There A Way To Access The Trains Configuration Programmatically At Runtime In A Task (The Configuration That Is Dumped By The Agent In The Logs Before Executing A Task)

Awesome, thanks WackyRabbit7 , AgitatedDove14 !

4 years ago

0 Hi, I Have Another Problem

yes, I use --gpus parameter

4 years ago

0 Are The Various Task Types Available In 0.15? I Am Getting

AgitatedDove14 Is it fixed with trains-server 0.15.1?

4 years ago

0 Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

AgitatedDove14 ok, but this happens in my local machine, not in the agent

3 years ago

0 Hi There,

Yes that was my assumption as well, it could be several causes to be honest now that I see that also matplotlib itself is leaking 😄

one year ago

0 Hi There,

Ok to be fair I get the same curve even when I remove clearml from the snippet, not sure why

one year ago

0 Hey There, Would It Be Possible To Make Clearml-Agents Support Both Docker Mode And Venv Mode At The Same Time? Ie. Not Requiring To Be Restarted To Switch The Mode. The Mode Should Be Define On The Task Level: I Start An Experiment And Define Whether It

Default would be venv, only use docker if an image is passed. Use case: not have to duplicate all queues to accept both docker and venv agents on the same instances

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Otherwise I can try loading the file with custom loader, save as temp file, pass the temp file to connect_configuration, it will return me another temp file with overwritten config, and then pass this new file to OmegaConf

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

ProxyDictPostWrite._to_dict() will recursively convert to dict and OmegaConf will not complain then

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

So I need to have this merging of small configuration files to build the bigger one

2 years ago

0 Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

I am confused now because I see in the master branch, the clearml.conf file has the following section:
# Or enable credentials chain to let Boto3 pick the right credentials. # This includes picking credentials from environment variables, # credential file and IAM role using metadata service. # Refer to the latest Boto3 docs use_credentials_chain: falseSo it states that IAM role using metadata service should be supported, right?

3 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

This one doesn’t have _to_dict unfortunately

2 years ago

0 Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

I was asking to exclude this possibility from my debugging journey 😁

4 years ago

0 Hi, I Recently Updated Clearml-Server To 1.7 And I Am Getting A Lot Of The Following Errors Since Today On Any Experiment (I Didn'T Had This Error Before):

This is the mapping of the faulty index:
` {
"events-plot-d1bd92a3b039400cbafc60a7a5b1e52b_new" : {
"mappings" : {
"dynamic" : "strict",
"properties" : {
"@timestamp" : {
"type" : "date"
},
"iter" : {
"type" : "long"
},
"metric" : {
"type" : "keyword"
},
"plot_data" : {
"type" : "binary"
},
"plot_len" : {
"type" : "long"
},
"plot_str" : {
...

2 years ago

0 Hi, I Have A Question Regarding The Aws_Autoscaler: It Usually Takes ~Hours To Get A Gpu Instance Nowadays. I Was Thinking, It Would Be Much More Interesting To Stop The Instances (Clearml-Agents) Instead Of Terminating Them Once They Are Inactive, So Tha

Nevertheless there might still be some value in that, because it would allow to reduce the starting time by removing the initial setup of the agent + downloading of the data to the instance - but not as much as I described initially, if instances stopped are bound to the same capacity limitations as new instances launched

2 years ago

0 Hi, I Would Like To Follow-Up In This

AgitatedDove14 Yes exactly! it is shown in the recording above

2 years ago

0 Hi, Although

Ok, in that case it probably doesn’t work, because if the default value is 10 secs, it doesn’t match what I get in the logs of the experiment: every second the tqdm adds a new line

3 years ago

Show more results