JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

29 Answers

2K Views

0 Votes 29 Answers 2K Views

Hi, Although

Hi, although https://github.com/allegroai/clearml/issues/181 is resolved, clearml-agent (0.17.2) still logs tqdm iterations as different lines, is there some...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hey There Again, I Am Not Sure To Understand What Is The Difference Between Storagemanager And Storagehelper And Which One To Use?

Hey there again, I am not sure to understand what is the difference between StorageManager and StorageHelper and which one to use?

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, I Am Considering Making Automated Backups Of My Clearml-Server Using Amazon Ebs Snapshots. Should I Be Concerned With The Same Problem Described Here >

Hi, I am considering making automated backups of my clearml-server using Amazon EBS snapshots. Should I be concerned with the same problem described here > h...

clearml

4 years ago

0 Votes

19 Answers

2K Views

0 Votes 19 Answers 2K Views

I Guess One Experiment Is Running Backwards In Time

I guess one experiment is running backwards in time 😄

clearml

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, In The Metric Snapshot Graph, Is It Possible To Scale The Y Axis To

Hi, in the Metric Snapshot graph, is it possible to scale the Y axis to [y_min *0.9, y_max * 1,1] ? currently all my values are flat at the same ~y and it is...

clearml

3 years ago

0 Votes

17 Answers

2K Views

0 Votes 17 Answers 2K Views

Hi, I Updated To Clearml-Server 1.4.0 And I Am Uncomfortable With The New Table/Detail View, Is There A Way To Disable It And Use The Previous One (On Click -> Open Details)?

Hi, I updated to clearml-server 1.4.0 and I am uncomfortable with the new Table/Detail view, is there a way to disable it and use the previous one (on click ...

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, In The Aws Autoscaler, Is It Possible To Specify Multiple Regions (Availability_Zone)? I Currently Use Eu-West-1A, And Would Like To Start Using Eu-West-1B And Eu-West-1C. I Tried Specifying A List In Availability_Zone Parameter, But Without Success:

Hi, in the aws autoscaler, is it possible to specify multiple regions (availability_zone)? I currently use eu-west-1a, and would like to start using eu-west-...

aws

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is It Possible To Get An Artifact From A Task And Force Not Using Local Cache? The Task Itself Updated The Artifact In The Meantime And I Cannot Get The Latest Version Of The Artifact. I Saw That

Hi, is it possible to get an artifact from a Task and force not using local cache? The task itself updated the artifact in the meantime and I cannot get the ...

clearml

4 years ago

0 Votes

13 Answers

3K Views

0 Votes 13 Answers 3K Views

Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

Hi, I am trying to use the clearml-agent in docker mode to run an experiment, but it seems to fail passing the clearml.conf file to the docker container: Exe...

clearml

2 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hey, I Have One Question Regarding The Cleanup_Service Task In The Devops Project: Does It Assume That The Agent In Services Mode Is In The Trains-Server Machine?

Hey, I have one question regarding the cleanup_service task in the DevOps project: Does it assume that the agent in services mode is in the trains-server mac...

mlops

5 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Hi, I Would Like To Create Backups Of My Trains-Server Periodically. I Was Thinking About Creating A Service Task Under The Devops Project. The Backup Task Would:

Hi, I would like to create backups of my trains-server periodically. I was thinking about creating a service task under the devops project. The backup task w...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hey There

Hey there 🙂 Still my journey to deploy the aws-autoscaler with spot instances, I have another question: I would like to limit the amount of time spent setti...

mlops

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, I Am Using Clearml With Pytorch-Ignite And Its Earlystopping Handler. I Would Like To Log The Counter Of The Patience Of This Handler, How Can I Do That?

Hi, I am using clearml with pytorch-ignite and its EarlyStopping handler. I would like to log the counter of the patience of this handler, how can I do that?

clearml

4 years ago

0 Votes

23 Answers

2K Views

0 Votes 23 Answers 2K Views

Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

Hi, I started a trains-agent (0.15) in services mode (full command: trains-agent daemon --services-mode --detached --queue services --create-queue --docker u...

mlops

5 years ago

0 Votes

13 Answers

2K Views

0 Votes 13 Answers 2K Views

Trains-Elastic | {"Type": "Server", "Timestamp": "2020-12-07T15:19:11,101Z", "Level": "Error", "Component": "O.E.B.Elasticsearchuncaughtexceptionhandler", "Cluster.Name": "Trains", "Node.Name": "Trains", "Message": "Uncaught Exception In Thread [Main]",

trains-elastic | {"type": "server", "timestamp": "2020-12-07T15:19:11,101Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "c...

clearml

4 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

Hi, another bug to report with the aws_auto_scaler using 1.1.2: Traceback (most recent call last): File "aws_autoscaler.py", line 297, in main() File "aws_au...

mlops

4 years ago

0 Votes

17 Answers

2K Views

0 Votes 17 Answers 2K Views

Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

Hello, I am trying to retrieve a simple dict artifact uploaded in a previous task with task.upload_artifact("my_dict", dict(foo="bar")) in a second task. I t...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is There A Way To Control After How Much Time An Agent That Went Down Is Removed From The Web-Ui? I Find The Current Value Too High For My Needs

Hi, is there a way to control after how much time an agent that went down is removed from the web-ui? I find the current value too high for my needs

mlops

2 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi There, I Am Trying To Start An Agent In Services Mode With Trains-Server Being On Localhost (But Not Started Together With The Docker-Compose!). My Trains.Conf Is The Following:

Hi there, I am trying to start an agent in services mode with trains-server being on localhost (but not started together with the docker-compose!). My trains...

mlops

5 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi There, I Think There Is A Bug With Clearml Sdk V0.17.5Rc2: When Running A Task Locally, The Dashboard Doesnt Not Shows The Task As Finished Once The Task Is Finished

Hi there, I think there is a bug with clearml sdk v0.17.5rc2: when running a task locally, the dashboard doesnt not shows the task as finished once the task ...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Does Trains 0.16 Supports Pip >=20.2?

Does trains 0.16 supports pip >=20.2?

clearml

5 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi There, Any Plan/Benefit To Support Virtualenv= 20 ?

Hi there, any plan/benefit to support virtualenv= 20 ?

clearml

5 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hey, Often I Want To Compare Scalars Of Two Experiments With The Same Name But With Different Tags. In The Scalars Comparison Tab, I Cannot See Which Experiment Is Which Because I Don’T See The Tags. Usually, I Rename The Experiments So That I Can Identif

Hey, often I want to compare scalars of two experiments with the same name but with different tags. In the SCALARS comparison tab, I cannot see which experim...

clearml

3 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, Together With

Hi, Together with ElegantKangaroo44 we found two unexpected behaviors in task.models['output'] : The input model of the task is included in the list The best...

clearml

5 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hi, Coming Back With The Venv Caching: With The Following Setting:

Hi, coming back with the venv caching: with the following setting: I call Task._update_requirements(["."]) setup.py has the following install_requires=["my-p...

mlops

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi All, I Updated From Clearml-Server 1.14.1 To 1.15.0 And I Am Getting The Following Error While Trying To Start The Server After Running Docker-Compose Pull:

Hi all, I updated from clearml-server 1.14.1 to 1.15.0 and I am getting the following error while trying to start the server after running docker-compose pul...

clearml

one year ago

0 Votes

28 Answers

2K Views

0 Votes 28 Answers 2K Views

Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Hi, I am trying to use omegaconf with task.connect_configuration and I get the following error: >>> OmegaConf.create(task.connect_configuration(config_dict))...

clearml

3 years ago

0 Votes

27 Answers

2K Views

0 Votes 27 Answers 2K Views

Hi There,

Hi there, I found a memory leak in Logger.report_matplotlib_figure . I was constantly running out of memory when training my models so I decided to spend som...

clearml

2 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hello, I Have A Small Question Regarding Ui: Currently, In The Artifacts Section Of A Task, The

Hello, I have a small question regarding UI: Currently, in the artifacts section of a task, the FILE PATH displayed for artifacts stored in s3 are displayed ...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hey There! I Would Like To Use The Function

Hey there! I would like to use the function task.set_project in the following way: my_task.set_project("Top level project/second level project") `` Top level...

clearml

3 years ago

Show more results

0 Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

SuccessfulKoala55 Could you please point me to where I could quickly patch that in the code?

4 years ago

0 Hey, I Have A Problem With The Following Task:

Thanks for the explanations,
Yes that was the case This is also what I would think, although I double checked yesterday:I create a task on my local machine with trains 0.16.2rc0 This task calls task.execute_remotely() The task is sent to an agent running with 0.16 The agent install trains 0.16.2rc0 The agent runs the task, clones it and enqueues the cloned task The cloned task fails because it has no hyper-parameters/args section (I can seen that in the UI) When I clone the task manually usin...

5 years ago

0 Hi, Is There A Way To Stop A Clearml-Agent From Within An Experiment? Or Block It To Prevent It Running Any Other Task?

My use case it: in a spot instance marked for termination after 2 mins by aws, I want to close a task and prevent the clearml-agent to pick up a new task after.

4 years ago

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Any chance this is reproducible ?

Unfortunately not at the moment, I could find a reproducible scenario. If I clone a task that was stuck and start it, it might not be stuck

How many processes do you see running (i.e. ps -Af | grep python) ?

I will check that when the next one will be blocked 👍

What is the training framework? is it multiprocess ? how are you launching the process itself? is it Linux OS? is it running inside a specific container ?

I train with p...

3 years ago

0 Hi! I Have A Question Regarding Performances Of The Clearml-Server: Are The Calls From The Agents Made Asynchronously/In A Non Blocking Separate Thread? Is The Connection To The Clearml-Server Expected To Be A Bottleneck If The Clearml-Server Is Far From

I mean when sending data from the clearml-agents, does it block the training while sending metrics or is it done in parallel from the main thread?

4 years ago

0 Hi There, I Just Updated Clearml-Server To 1.8.0 And I See The Following But In The Comparison Of Scalars: All The Graphs Are Compressed To The Left When The Experiment Name Is Too Long In The Legend. I Will Now Try In 1.7.0 (It Was Not The Case In 1.6.0)

Done in https://github.com/allegroai/clearml-server/issues/166

2 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

there is no error from this side, I think the aws autoscaler just waits for the agent to connect, which will never happen since the agent won’t start because the userdata script fails

4 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

I get the following error:

4 years ago

0 Hi, Although

TimelyPenguin76 clearml 0.17.5 and clearml-agent 0.17.2

4 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

For some reason the configuration object gets updated at runtime to
resource_configurations = null queues = null extra_trains_conf = "" extra_vm_bash_script = ""

4 years ago

0 We Can’T Add Overview To The Subprojects (Btw Thank You So Much For Subprojects, This Is Probably The Best Feature Ever Introduced To Trains/Clearml). Is It Intended? When I Click Overview For The Subproject, It Just Shows An Empty Page Without Any Button

mmmh probably yes, I can’t say for sure (because I don’t remember precisely when I upgraded to 0.17) but it looks like that

4 years ago

0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

This works well when I run the agent in virtualenv mode (remove --docker )

2 years ago

Now it starts, I’ll see if this solves the issue

4 years ago

0 Hey, I Moved My Trains-Server To Another Machine, Zipping The /Opt/Trains/Data Folder As Described In The Docs

Relevant issue in Elasticsearch forums: https://discuss.elastic.co/t/elasticsearch-5-6-license-renewal/206420

5 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

SuccessfulKoala55 Since 2 hours I get 504 errors and I cannot ssh into the machine. AWS reports that instance health checks fail. Is it safe to restart the instance?

4 years ago

0 Hey There, Is It Possible For A Clearml Pipeline Step To Log A Folder Instead Of Numpy/Pickle Objects? Looking At The Docs,

In all the steps I want to store them as artifacts to s3 because it’s very convenient.
The last step should merge them all, ie. it needs to know all the other artifacts of the previous steps

3 years ago

0 Hi, I Updated To Clearml-Server 1.4.0 And I Am Uncomfortable With The New Table/Detail View, Is There A Way To Disable It And Use The Previous One (On Click -> Open Details)?

DeterminedCrab71 This is the behaviour of holding shift while selecting in Gmail, if ClearML could reproduce this, that would be perfect!

3 years ago

Is there one?

No, I rather wanted to understand how it worked behind the scene 🙂

The latest RC (0.17.5rc6) moved all logs into separate subprocess to improve speed with pytorch dataloaders

That’s awesome!

4 years ago

0 Hey Guys, I Am Trying To Plan What I Need To Do In Order To Efficiently Use Clearml With Spot Instances 1) Detecting When Spot Instance Is Down And Experiment Is Aborted 2) Extracting S3 Address Of The Latest Checkpoint From Clearml Api 3) Starting New E

we use task.models[] 🙂

4 years ago

I have the same problem, but not only with subprojects, but for all the projects, I get this blank overview tab as shown in the screenshot. It only worked for one project, that I created one or two weeks ago under 0.17

4 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Ok, I guess I’ll just delete the whole loss series. Thanks!

4 years ago

0 Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

Guys the experiments I had running didn't fail, they just waited and reconnected, this is crazy cool

4 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

but according to the disks graphs, the OS disk is being used, but not the data disk

4 years ago

0 Hi, I Have Another Bug To Report For Clearml-Server 1.2 (Self Hosted) In The Console Logs Of An Experiments, I Cannot See The Latest Logs. Eg My Experiment Is Done, But I Can Only See The Logs Of To The Installation Of The Packages. If I Download The Log

Hi CostlyOstrich36 , this weekend I took a look at the diffs with the previous version ( https://github.com/allegroai/clearml-server/compare/1.1.1...1.2.0# ) and I saw several changes related to the scrolling/logging:
apiserver/bll/event/ http://log_events_iterator.py apiserver/bll/event/ http://events_iterator.py apiserver/config/default/services/_mongo.conf apiserver/database/model/ http://base.py apiserver/services/ http://events.pyI suspect that one of these changes might be responsible ...

3 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

I ended up dropping omegaconf altogether

3 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

So two possible cases for trains-agent-1: either:
It picks a new experiment -> show randomly one of the two experiments in the "workers" tab no new experiment in default queue to start -> show randomly no experiment or the one that it is running

5 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

I also tried task.set_initial_iteration(-task.data.last_iteration) , hoping it would counteract the bug, didn’t work

4 years ago

Actually I think I am approaching the problem from the wrong angle

3 years ago

0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

yes, will do now!

5 years ago

0 Hi, I Would Like To Follow-Up In This

AgitatedDove14 SuccessfulKoala55 I just saw that clearml-server 1.4.0 was released, congrats 🚀 🙌 Was this bug fixed with this new version?

3 years ago

Show more results