JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Hi again, my clearml api-server is having a memory leak. Each time I restart it, its ram consumption grows until getting OOM, is not killed and make the ec2 ...

clearml

4 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Hi Guys, Coming This Time To Share An Idea Of A Killer Feature For Clearml

Hi guys, coming this time to share an idea of a killer feature for ClearML 🚀 I am pretty sure you guys already heard of https://www.streamlit.io/ , which is...

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi, In The Metric Snapshot Section Of The Overview Tab Of A Project Page, Would It Be Possible To:

Hi, in the Metric Snapshot section of the Overview tab of a project page, would it be possible to: Show running experiments Have the legend clickable, to hid...

clearml

3 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hello There! I Have A Question Regarding The Web Ui, On The Project Page: I Have The Following Use Case: I Need To Add Two Custom Columns, Each Reporting One Metric. Currently, This Shows Me The Best (Min/Max) Values Reached By The Model, But Not Necessar

Hello there! I have a question regarding the Web UI, on the project page: I have the following use case: I need to add two custom columns, each reporting one...

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, In The Aws Autoscaler, I Am Getting The Following Warning:

Hi, in the AWS AutoScaler, I am getting the following warning: Warning! exception occurred: APIError: code 400/1004: Worker is not registered: worker=aws:A10...

clearml

4 years ago

Show more results

0 Hi, Is There A Way To See How Many Experiments Are In Each Queue, Using Api/Python Code?

Hi, yes, you can use trains_agent.backend_api.session.client.APIClient.queues.get_all()

5 years ago

0 Hi There, I Have A Bit Of A Problem With Aws Secrets: I Pass Keys As Env Var To Clearml-Agents To Retrieve Data From A Bucket In Us-East-1 But I Use A Bucket To Store Task Artifacts In A Bucket In Eu-Central-1. So When I Pass Aws Keys As Env Vars, The Tas

Sorry both of you, my problem was actually lying somewhere else (both buckets are in the same region) - thanks for you time!

4 years ago

0 Hey, I Would Like My Experiment To Call At Some Point A Cli Program Installed As A Dependency Of The Experiment. Here Is What I Do:

It worked like a charm 😱 Awesome thanks AgitatedDove14 !

5 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

Here is the log from the instance

4 years ago

0 Hey There, I See That In The Autoscaler Configuration, The

TimelyPenguin76 That sounds amazing! will there be a fallback mechanism as well? often p3.2xlarge are on shortage, would be nice to define one resources req as first choice (eg. p3.2xlarge) -> if not available -> use another resources req (eg. g4dn)

4 years ago

0 Hey, Clearml Team! When Can We Expect An Updated Roadmap? Last One Is From August

That would be amazing!

4 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

This allows me to inject yaml files into other yaml files

3 years ago

0 Hi, I Would Like To Switch From The Elastic-Search Service In The Docker-Compose Of The Clearml-Server To An Externally Managed, Scalable Elastic-Search Cluster. I Have Two Questions:

the api-server shows when starting:
clearml-apiserver | [2021-07-13 11:09:34,552] [9] [INFO] [clearml.es_factory] Using override elastic host `
clearml-apiserver | [2021-07-13 11:09:34,552] [9] [INFO] [clearml.es_factory] Using override elastic port 9200
...
clearml-apiserver | [2021-07-13 11:09:38,407] [9] [WARNING] [clearml.initialize] Could not connect to ElasticSearch Service. Retry 1 of 4. Waiting for 30sec
clearml-apiserver | [2021-07-13 11:10:08,414] [9] [WARNING] [clearml.initia...

4 years ago

0 Not Very Important, But Small Suggestion For The Web Ui: Under The Queues Tab, In The Queues Wait Time Graph, Would It Be Possible To Switch From Seconds To Minutes? When Waiting For Aws Instances, Usually It Can Take Up To An Hour, So Having 3.3K Seconds

https://github.com/allegroai/clearml/issues/547

3 years ago

0 Hi, How Can I Get The Logs From The Pytorch Ignite Early Stopping Handler To Be Logged In Clearml?

Just tested locally, in terminal its the same: with the hack it works, without the hack it doesn't show the logger messages

4 years ago

0 Hi, I Would Like To Bring Awareness

@<1537605940121964544:profile|EnthusiasticShrimp49> I'll try setting the cuda version clearml.conf, thanks for the tip!
@<1523701205467926528:profile|AgitatedDove14> Could you please push the code for that version on github?

2 years ago

0 Hi, Where Can I Find The Server Parameter To Control When The Server Is Unregistering An Agent After Not Receiving Updates? Currently It'S Quite Long (30Mins) And This Prevents The Autoscaler From Launching A New Agent

Thanks @<1523701087100473344:profile|SuccessfulKoala55> ! Are alive workers sending ping to notify the server that they are alive or does the server infers that they are alive based on the last communication?

2 years ago

0 Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

Alright, I will try now

4 years ago

0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

Looks like its a hurray then 😄 🎉 🍾

4 years ago

0 Hi, Together With

with the RC version

5 years ago

0 Hi, Is It Still True That --Services-Mode Only Supports Docker Mode?

It could be: I am running the clearml aws autoscaler in an ec2 instance having iam roles allowing for creating/deleting instances, but I get Warning! exception occurred: An error occurred (UnauthorizedOperation) when calling the RunInstances operation: You are not authorized to perform this operation. Encoded authorization failure message: ...
I suspect that since the agent is running in docker mode, the boto3 lib doesn’t automatically get the right permissions from the ec2-instance. To...

4 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

Ok, this I cannot locate

4 years ago

0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

Thanks!

4 years ago

0 Hi There, I Just Updated Clearml-Server To 1.8.0 And I See The Following But In The Comparison Of Scalars: All The Graphs Are Compressed To The Left When The Experiment Name Is Too Long In The Legend. I Will Now Try In 1.7.0 (It Was Not The Case In 1.6.0)

Done in https://github.com/allegroai/clearml-server/issues/166

2 years ago

0 Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

interestingly, it works on one machine, but not on another one

4 years ago

0 Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

Yea I really need that feature, I need to move away from key/secrets to iam roles

4 years ago

yes SparklingHedgehong28 🙂

3 years ago

0 Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

yes, so it does exit the local process (at least, the command returns), but another process is still running on the background and is logging things from time to time (such as:)
ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start

4 years ago

0 Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

with 1.1.1 I get
User aborted: stopping task (3)

4 years ago

0 Hi, I Would Like To Bring Awareness

So I suppose clearml-agent is not responsible, because it finds a wheel for torch 1.11.0 with cu117. It just happens that this wheel doesn't work in ec2 g5 instances suprizingly. Either I'll hardcode the correct wheel or I'll upgrade torch to 1.13.0

2 years ago

SuccessfulKoala55 I want to avoid writing creds in plain text in the config file

4 years ago

0 Hi, In The Aws Autoscaler, Is It Possible To Specify Multiple Regions (Availability_Zone)? I Currently Use Eu-West-1A, And Would Like To Start Using Eu-West-1B And Eu-West-1C. I Tried Specifying A List In Availability_Zone Parameter, But Without Success:

yea I just realized that you would also need to specify different subnets, etc… not sure how easy it is 😞 But it would be very valuable, on-demand GPU instances are so hard to spin up nowadays in aws 😄

4 years ago

0 Hi, I Cannot Manage To Start Trains-Server 0.16 With The Docker-Compose File, The Trains-Elastic Container Fails With The Following Error:

SuccessfulKoala55 Here is the trains-elastic error

5 years ago

0 Hey There

Alright, thanks SuccessfulKoala55 !

4 years ago

0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

the reindexing operation showed no error and copied everything

4 years ago

Show more results