JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

979 × Eureka!

Questions 214
Answers 1021

0 Votes

18 Answers

1K Views

0 Votes 18 Answers 1K Views

Hey There, I Would Like To Increase The

Hey there, I would like to increase the ulimit for the number of files opened at the same time in a ec2 instance. According to this https://stackoverflow.com...

clearml

3 years ago

0 Votes

11 Answers

978 Views

0 Votes 11 Answers 978 Views

Hi Guys, Following Up On This

Hi guys, following up on this https://allegroai-trains.slack.com/archives/CTK20V944/p1599135173096200?thread_ts=1599125260.076600&cid=CTK20V944 : I have a pi...

clearml

4 years ago

0 Votes

4 Answers

896 Views

0 Votes 4 Answers 896 Views

Is There A Way To Report A Simple Series With X And Y Coords, X And Y Being Two Lists Of Same Length?

Is there a way to report a simple series with X and Y coords, X and Y being two lists of same length?

clearml

4 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hey, What Is The Exact Difference Between

Hey, what is the exact difference between agent.package_manager.system_site_packages and trains-agent --install-globally ?

clearml

4 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hey There, Does Trains Support

Hey there, Does trains support clicks ? (entry points defined with that library)

clearml

4 years ago

0 Votes

3 Answers

979 Views

0 Votes 3 Answers 979 Views

Hi Guys, Is It Possible To Spin Up Two Agents On One Gpu? Something Like

hi guys, is it possible to spin up two agents on one GPU? Something like trains-agent daemon --gpus 0 --queue default & trains-agent daemon --gpus 0 --queue ...

clearml

3 years ago

0 Votes

2 Answers

636 Views

0 Votes 2 Answers 636 Views

Hi All, How Can I Have A Global Variable Used In A Pipeline Step? I Have To Define Them In Each Pipeline Step, Otherwise They Are Not Included In The Pipeline Step

Hi all, how can I have a global variable used in a pipeline step? I have to define them in each pipeline step, otherwise they are not included in the pipelin...

clearml

8 months ago

0 Votes

1 Answers

966 Views

0 Votes 1 Answers 966 Views

Hi There, Is It Safe To Use Clearml (Trains >= 0.17) With The Trains Ignite Handler? Should We Wait For The Update On Their Side?

Hi there, is it safe to use ClearML (trains >= 0.17) with the trains ignite handler? Should we wait for the update on their side?

clearml

3 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hey There, I Moved The Clearml S3 Bucket Where I Stored All My Clearml Data From One S3 Bucket To Another And Now I Realized That All The Models/Experiments Logged In The Clearml-Server Still Refer To The Old S3 Bucket. Is There A Way To Update All The Re

Hey there, I moved the clearml s3 bucket where I stored all my clearml data from one s3 bucket to another and now I realized that all the models/experiments ...

clearml

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, A Small Bug (Not Really A Bug) In The Autoscaler: I Have P3.2Xlarge Instances That Take A Long Time To Shutdown. With

Hi, a small bug (not really a bug) in the autoscaler: I have p3.2xlarge instances that take a long time to shutdown. With polling_interval_time_min=1 , the a...

mlops

3 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hi, I See That There Is A New Parameter In Aws Autoscaler:

Hi, I see that there is a new parameter in aws autoscaler: max_spin_up_time_min - What is the difference with max_idle_time_min ?

aws

3 years ago

0 Votes

12 Answers

1K Views

0 Votes 12 Answers 1K Views

Hi, I Deleted Some Archived Experiments In Clearml Server 1.0 And The Popup In The Dashboard Showed “The Following Artifacts Were Not Deleted”, With A List Of Files That Are Under

Hi, I deleted some archived experiments in clearml server 1.0 and the popup in the dashboard showed “the following artifacts were not deleted”, with a list o...

clearml

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

Hi, I am getting the following errors in the experiments I am currently running: 2021-06-25 17:11:47,911 - clearml.Metrics - ERROR - Action failed <504/0: ev...

clearml

3 years ago

0 Votes

25 Answers

987 Views

0 Votes 25 Answers 987 Views

Hi, I Have Another Problem

Hi, I have another problem 😅 in one of my agent, one experiment started without torch using GPU. In the logs of the experiment shared below, we can see that...

clearml

4 years ago

0 Votes

18 Answers

1K Views

0 Votes 18 Answers 1K Views

Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

Hi, kudos for the 0.15 guys! I am having an issue related to git auth: I have an issue with trains-agent (0.15): it does not use git creds while trying to cl...

mlops

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi There, Would It Be Possible For The Autoscaler To Support Stopping Instances Instead Of Terminating Them? My Use Case Is The Following: I Am Continuing My Journey With The Clearml-Session Tool, And In Case The Clearml-Session Is Running In A Ec2 Inst

Hi there, would it be possible for the autoscaler to support stopping instances instead of terminating them? My use case is the following: I am continuing my...

mlops remote-ssh

2 years ago

0 Votes

5 Answers

969 Views

0 Votes 5 Answers 969 Views

Hi, Is It Possible To Disable Some Of The System Metrics Monitored? And Also Downsample The Rate Of Logging?

Hi, is it possible to disable some of the system metrics monitored? and also downsample the rate of logging?

clearml

3 years ago

0 Votes

12 Answers

1K Views

0 Votes 12 Answers 1K Views

Hi, Where Can I Find The Server Parameter To Control When The Server Is Unregistering An Agent After Not Receiving Updates? Currently It'S Quite Long (30Mins) And This Prevents The Autoscaler From Launching A New Agent

Hi, where can I find the server parameter to control when the server is unregistering an agent after not receiving updates? Currently it's quite long (30mins...

mlops

one year ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi Again, Is There A Way To Pass Secrets As Parameters Of A Task? I Have An Experiment That Requires Connecting To A Database, And I Need To Be Able To Pass The Creds As Task Params (Or In Another Way, I Don'T Know Yet). But I Don'T Want To Expose My Cred

Hi again, is there a way to pass secrets as parameters of a task? I have an experiment that requires connecting to a database, and I need to be able to pass ...

clearml

3 years ago

0 Votes

16 Answers

1K Views

0 Votes 16 Answers 1K Views

Got Some Errors While Running Migration Script From Es5 To Es7:

Got some errors while running migration script from ES5 to ES7: 2020-08-11 15:21:50,130 Running on: Linux 2020-08-11 15:21:50,227 Docker allocated memory: 16...

clearml

4 years ago

0 Votes

1 Answers

994 Views

0 Votes 1 Answers 994 Views

Hi, I Encounter The Following Bug With Clearml 0.17.5Rc2: When I Start A Task Locally And That Task Raises Cuda Out Of Memory, The Command Returns But The Process Is Not Killed, And Therefore The Gpu Ram Is Not Freed

Hi, I encounter the following bug with clearml 0.17.5rc2: When I start a task locally and that task raises cuda out of memory, the command returns but the pr...

clearml

3 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...

clearml

3 years ago

0 Votes

4 Answers

981 Views

0 Votes 4 Answers 981 Views

Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

Hi guys, I got a very unexpected error today on in one of my agents: ... Collecting tqdm Using cached tqdm-4.48.2-py2.py3-none-any.whl (68 kB) Processing /ro...

clearml

4 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, I Cannot Manage To Start Trains-Server 0.16 With The Docker-Compose File, The Trains-Elastic Container Fails With The Following Error:

Hi, I cannot manage to start trains-server 0.16 with the docker-compose file, the trains-elastic container fails with the following error:

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Another One: What Is The Difference Between Task.Connect() And Task.Set_Parameter?

Another one: What is the difference between Task.connect() and Task.set_parameter?

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hello, Pytorch 1.8 Was Released, Bringing Amd Wheels With It > Pip Install Torch -F

Hello, Pytorch 1.8 was released, bringing AMD wheels with it > pip install torch -f https://download.pytorch.org/whl/rocm4.0.1/torch_stable.html Is ClearML s...

clearml

3 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hello, Is It Possible For The Clearml-Agent In Docker Mode To Not Pull A Specific Docker Image, But To Build One From The Experiment Repository Using The Dockerfile And .Dockerignore Of The Experiment Repository?

Hello, is it possible for the clearml-agent in docker mode to not pull a specific docker image, but to build one from the experiment repository using the Doc...

clearml

2 years ago

0 Votes

3 Answers

977 Views

0 Votes 3 Answers 977 Views

Hi Guys, Since I Am Done With Implementing The Aws Autoscaler, I Would Like To Share Some Pain Points That I Encountered In The Process With The Hope That They Can Be Documented To Help Other Users:

Hi guys, since I am done with implementing the AWS autoscaler, I would like to share some pain points that I encountered in the process with the hope that th...

aws

3 years ago

0 Votes

17 Answers

1K Views

0 Votes 17 Answers 1K Views

Hi, I Have Another Bug To Report For Clearml-Server 1.2 (Self Hosted) In The Console Logs Of An Experiments, I Cannot See The Latest Logs. Eg My Experiment Is Done, But I Can Only See The Logs Of To The Installation Of The Packages. If I Download The Log

Hi, I have another bug to report for clearml-server 1.2 (self hosted) In the console logs of an experiments, I cannot see the latest logs. Eg my experiment i...

clearml

2 years ago

0 Votes

27 Answers

1K Views

0 Votes 27 Answers 1K Views

Hi There,

Hi there, I found a memory leak in Logger.report_matplotlib_figure . I was constantly running out of memory when training my models so I decided to spend som...

clearml

one year ago

Show more results

0 Hi, I Would Like To Create Backups Of My Trains-Server Periodically. I Was Thinking About Creating A Service Task Under The Devops Project. The Backup Task Would:

ok, thanks SuccessfulKoala55 !

3 years ago

I’d like to move to a setup where I don’t need these tricks

2 years ago

0 Hi, On Clearml-Server 1.5.0, In Scalar Graphs, The New Default Value Is “Show Closest Data On Hover”. Would It Be Possible To Make It Automatically Set To “Compare Data On Hover” When Comparing Multiple Experiments?

So it is there already, but commented out, any reason why?

2 years ago

0 Hi, How Can I Change The Project.Default_Output_Destination? I Tried Setting It To None But It Is Not Updated

Thanks AgitatedDove14 ! I created a project with a default output destination to a s3 bucket but I don't have local access to this bucket (only agents have access to it for security reasons). Because of that, I cannot create a task in this project programmatically locally because it tries to access the bucket and fails. And there is no easy way to change the default output location (not in the web UI, not in the sdk)

2 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

yes, the only thing I changed is:
install_requires=[ ... "my-dep @ git+ ]to:
install_requires=[ ... "git+ "]

3 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

torch==1.7.1 git+ .

3 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

I am already trying with latest of pip 😞

3 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Hey SuccessfulKoala55 , unfortunately this doesn’t work, because the dict contains others dicts, and only the first level dict becomes a dict, the inner dicts still are ProxyDictPostWrite and will make OmegaConf.create fail

2 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

This is consistent: Each time I send a new task on the default queue, if trains-agent-1 has only one task running (the long one), it will pick another one. If I add one more experiment in the queue at that point (trains-agent-1 running two experiments at the same time), that experiment will stay in queue (trains-agent-2 and trains-agent-3 will not pick it because they also are running experiments)

4 years ago

0 Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

line 13 is empty 🤔

4 years ago

0 Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

No space, I will add and test 🙂

4 years ago

0 Hi There, I Have A Problem With Pyjwt: I Am Using

hooo now I understand, thanks for clarifying AgitatedDove14 !

3 years ago

0 Hi Quick Question: Does Task.Connect_Configuration Support Omegaconf Dictconfig Objects? Ie. Can I Do:

Hi CostlyOstrich36 , I am not using Hydra, only OmegaConf, so you mean just calling OmegaConf.load should be enough?

2 years ago

0 Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

No 🙂

4 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

with open(path, "r") as stream: return yaml.load(stream, Loader=yaml.FullLoader)

2 years ago

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Note: Could be related to https://github.com/allegroai/clearml/issues/790 , not sure

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

I have a custom way of reading the config file

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

it would be nice if Task.connect_configuration could support custom yaml file readers for me

2 years ago

0 Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

And since I ran the task locally with python3.9, it used that version in the docker container

one year ago

0 Another One: What Is The Difference Between Task.Connect() And Task.Set_Parameter?

super, thanks!

4 years ago

0 I Guess One Experiment Is Running Backwards In Time

CostlyOstrich36 I don’t see such number, can you please share a screenshot of where to look at?

2 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

hoo thats cool! I could place torch==1.3.1 there

4 years ago

0 Hi There, I Have A Bit Of A Problem With Aws Secrets: I Pass Keys As Env Var To Clearml-Agents To Retrieve Data From A Bucket In Us-East-1 But I Use A Bucket To Store Task Artifacts In A Bucket In Eu-Central-1. So When I Pass Aws Keys As Env Vars, The Tas

thanks Jake, good to know!

3 years ago

Sorry both of you, my problem was actually lying somewhere else (both buckets are in the same region) - thanks for you time!

3 years ago

0 Got Some Errors While Running Migration Script From Es5 To Es7:

here it is

4 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

AgitatedDove14 Didn’t work 😞

3 years ago

0 Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

mmmmh I just restarted the experiment and it seems to work now. I am not sure why that happened. From this SO it could be related to size of the repo. Might be a good idea to clone with --depth 1 in the agents?
Or more generally, try to catch this error and retry a few times?

4 years ago

0 Hi There, I Have Several Experiments Hanging/Stuck In The Middle Or At The End Of The Training, With The Last Message Logged Being:

Hi @<1523701087100473344:profile|SuccessfulKoala55> I was able to find the issue, I was creating a queue and worker subprocess that were not properly cleaned up

7 months ago

0 Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

Unfortunately this is difficult to reproduce... Neverthless it would be important to me to be robust against it, because if this error happens in a task in the middle of my pipeline, the whole process fails.

This binds to another wider topic I think: How to "skip" tasks if they already run (a mechanism similar to what [ https://luigi.readthedocs.io/en/stable/ ] offers). That would allow to restart the pipeline and skip tasks until the point where the task failed

4 years ago

ClearML has a task.set_initial_iteration , I used it as such:
checkpoint = torch.load(checkpoint_fp, map_location="cuda:0") Checkpoint.load_objects(to_load=self.to_save, checkpoint=checkpoint) task.set_initial_iteration(engine.state.iteration)But still the same issue, I am not sure whether I use it correctly and if it’s a bug or not, AgitatedDove14 ? (I am using clearml 1.0.4rc1, clearml-agent 1.0.0)

3 years ago

Show more results