JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

979 × Eureka!

Answers 1021

0 Hi, Where Can I Find The Server Parameter To Control When The Server Is Unregistering An Agent After Not Receiving Updates? Currently It'S Quite Long (30Mins) And This Prevents The Autoscaler From Launching A New Agent

Thanks @<1523701087100473344:profile|SuccessfulKoala55> ! Are alive workers sending ping to notify the server that they are alive or does the server infers that they are alive based on the last communication?

one year ago

0 Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

Ok, after:

4 years ago

0 Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

SmugDolphin23 Actually adding agent.python_binary didn't work, it was not read by the clearml agent (in the logs dumped by the agent, agent.python_binary = (no value)

one year ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

And so in the UI, in workers&queues tab, I see randomly one of the two experiments for the worker that is running both experiments

4 years ago

0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

When installed with http://get.docker.com , it works

one year ago

0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

So it seems like it doesn't copy /root/clearml.conf and it doesn't pass the environment variables (CLEARML_API_HOST, CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY)

one year ago

0 Hi, Although

Nice! What is the default value?

3 years ago

0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

in an ec2 machine

one year ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

how would it interact with the clearml-server api service? would it be completely transparent?

3 years ago

0 Hi, I Am Using The Aws Autoscaler And Getting The Following Error While Trying To Spin Up Spot Instances:

Thanks!

3 years ago

0 Hey, Often I Want To Compare Scalars Of Two Experiments With The Same Name But With Different Tags. In The Scalars Comparison Tab, I Cannot See Which Experiment Is Which Because I Don’T See The Tags. Usually, I Rename The Experiments So That I Can Identif

Ho and also use the colors of the series. That would be a killer feature. Then I simply need to match the color of the series to the name to check the tags

3 years ago

0 Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

It failed as well

4 years ago

0 Hi There, I Think There Is A Bug With Clearml Sdk V0.17.5Rc2: When Running A Task Locally, The Dashboard Doesnt Not Shows The Task As Finished Once The Task Is Finished

(Even if I explicitely do my_task.close() )

3 years ago

0 Hi, I Have A Question Regarding The Aws_Autoscaler: It Usually Takes ~Hours To Get A Gpu Instance Nowadays. I Was Thinking, It Would Be Much More Interesting To Stop The Instances (Clearml-Agents) Instead Of Terminating Them Once They Are Inactive, So Tha

Nevertheless there might still be some value in that, because it would allow to reduce the starting time by removing the initial setup of the agent + downloading of the data to the instance - but not as much as I described initially, if instances stopped are bound to the same capacity limitations as new instances launched

2 years ago

Yes AnxiousSeal95 , stopped instance meaning you don’t pay for it, but just its storage, as described https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html . So AgitatedDove14 increasing the IDLE timeout would still make me pay for the instance while they are idle.

Do you get stopped instances instantely when you ask for them?

Well that’s a good question, that’s what I observed some time ago, but according to their https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/...

2 years ago

0 Hi, I Would Like To Report Another Bug Introduced With Clearml-Server 1.2.0: In The Comparison Page Of Two Experiments, On The Scalar Tab, With The Graph Layout, When Clicking On The Eye On One Scalar Group To Hide The Related Graphs, The Later Do Disappe

Interesting - I can reproduce easily

2 years ago

0 Hey There, Happy New Year To All Of You

Hi AgitatedDove14 , so I ran 3 experiments:
One with my current implementation (using "fork") One using "forkserver" One using "forkserver" + the DataLoader optimizationI sent you the results via MP, here are the outcomes:
fork -> 101 mins, low RAM usage (5Go constant), almost no IO forkserver -> 123 mins, high RAM usage (16Go, fluctuations), high IO forkserver + DataLoader optimization: 105 mins, high RAM usage (from 28Go to 16Go), high IO
CPU/GPU curves are the same for the 3 experiments...

3 years ago

0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

(docker was install with sudo snap install docker )

one year ago

0 Hi, On Clearml-Server 1.5.0, In Scalar Graphs, The New Default Value Is “Show Closest Data On Hover”. Would It Be Possible To Make It Automatically Set To “Compare Data On Hover” When Comparing Multiple Experiments?

Could you please point me to the relevant component? I am not familiar with typescript unfortunately 😞

2 years ago

There is a pinned github thread on https://github.com/allegroai/clearml/issues/81 , seems to be the right place?

3 years ago

0 Got Some Errors While Running Migration Script From Es5 To Es7:

sure, will be happy to debug that 🙂

4 years ago

0 Hi There, It Seems Like There Is A Bug With The Visualization Of Debug Samples On The Ui (Server V1.2.0, Self-Hosted): When Clicking On A Debug Sample Then On The Download Button, If The Sample Is Stored In S3, The Download Button Opens A Blank Page With

AgitatedDove14 I am actually considering rolling back to 1.1.0, so 1.3.0 is not really an option for now

2 years ago

So either I specify in the clearml-agent agent.python_binary: python3.8 as you suggested, or I enforce the task locally to run with python3.8 using task.data.script.binary

one year ago

0 Hi, I Have Another Problem

I don't know why it didn't detect it in first place

4 years ago

0 Hi, I Have A Long Running Experiment That Was Running On Aws Instance That Got Killed After ~4 Days With The Following Reason:

I assume you’re using a self-hosted server?

Yes

2 years ago

0 Hello, Is It Possible For The Clearml-Agent In Docker Mode To Not Pull A Specific Docker Image, But To Build One From The Experiment Repository Using The Dockerfile And .Dockerignore Of The Experiment Repository?

Ho nice, thanks for pointing this out!

2 years ago

So it is there already, but commented out, any reason why?

2 years ago

0 Hi, How Can I Change The Project.Default_Output_Destination? I Tried Setting It To None But It Is Not Updated

Thanks AgitatedDove14 ! I created a project with a default output destination to a s3 bucket but I don't have local access to this bucket (only agents have access to it for security reasons). Because of that, I cannot create a task in this project programmatically locally because it tries to access the bucket and fails. And there is no easy way to change the default output location (not in the web UI, not in the sdk)

2 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

yes, the only thing I changed is:
install_requires=[ ... "my-dep @ git+ ]to:
install_requires=[ ... "git+ "]

3 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

torch==1.7.1 git+ .

3 years ago

Show more results