JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

28 Answers

2K Views

0 Votes 28 Answers 2K Views

Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Hi, I am trying to use omegaconf with task.connect_configuration and I get the following error: >>> OmegaConf.create(task.connect_configuration(config_dict))...

clearml

3 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, I Deleted All Archived Experiments In A Project And I Just Realized All Experiments Of All Projects Were Deleted (Clearml Server V1.0.0)

Hi, I deleted all archived experiments in a project and I just realized all experiments of all projects were deleted (clearml server v1.0.0) 🤔

clearml

4 years ago

0 Votes

13 Answers

2K Views

0 Votes 13 Answers 2K Views

Hey There, Is It Possible For A Clearml Pipeline Step To Log A Folder Instead Of Numpy/Pickle Objects? Looking At The Docs,

Hey there, Is it possible for a clearml pipeline step to log a folder instead of numpy/pickle objects? Looking at the docs, monitor_artifacts could be what I...

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Looks Like Trains-Agent 0.16

Looks like trains-agent 0.16 doesn't support --install-globally documented parameter -> Only available for trains-agent build command. Would it be possible t...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is There A Way To Control After How Much Time An Agent That Went Down Is Removed From The Web-Ui? I Find The Current Value Too High For My Needs

Hi, is there a way to control after how much time an agent that went down is removed from the web-ui? I find the current value too high for my needs

mlops

2 years ago

0 Votes

27 Answers

2K Views

0 Votes 27 Answers 2K Views

Hi There,

Hi there, I found a memory leak in Logger.report_matplotlib_figure . I was constantly running out of memory when training my models so I decided to spend som...

clearml

2 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi There, I Recently Updated Clearml Server To 1.7.0, And Found The Following

⚠️ Hi there, I recently updated clearml server to 1.7.0, and found the following critical regression: When I reset an experiment, it is actually deleted 😵 ,...

clearml

2 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

Hello there, I would like to do run cleanup code in case the user aborts one task from the dashboard (the agent is not using the task in docker). What signal...

mlops

4 years ago

0 Votes

9 Answers

2K Views

0 Votes 9 Answers 2K Views

Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, In The Context Of Multi-Gpu Training, Is

Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, A Small Bug (Not Really A Bug) In The Autoscaler: I Have P3.2Xlarge Instances That Take A Long Time To Shutdown. With

Hi, a small bug (not really a bug) in the autoscaler: I have p3.2xlarge instances that take a long time to shutdown. With polling_interval_time_min=1 , the a...

mlops

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, There Is A "Bug" Introduced In The Latest Version Of Clearml-Server: When An Experiment Is In "Full Screen View", In The Console Tab, The Auto Refreshing Of The Console Makes The Console Disappearing For A Short Moment. When The Console Reappears, The

Hi, there is a "bug" introduced in the latest version of clearml-server: when an experiment is in "full screen view", in the console tab, the auto refreshing...

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

Hi, I just updated clearml-server to 1.1.0 and got the following error when starting it with docker-compose: clearml-apiserver | [2021-08-02 13:37:09,852] [8...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, I Would Like To Use Pytorch3D==0.5.0 With Torch==1.9.1 On Cuda Version 110, Locally It Works, But The Clearml Agent Fails Setting Up The Environment With The Following Error:

Hi, I would like to use pytorch3d==0.5.0 with torch==1.9.1 on cuda version 110, locally it works, but the clearml agent fails setting up the environment with...

mlops

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Are The Env Variables Passed To Trains-Agent Available In Experiments Run By This Trains-Agent?

Are the env variables passed to trains-agent available in experiments run by this trains-agent?

clearml

5 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hi There! Is There An Easy Way To Retrieve The Site-Package Directory That Was Created By An Agent From Inside A Task? Eg.

Hi there! Is there an easy way to retrieve the site-package directory that was created by an agent from inside a task? Eg. task = Task.init(...) task.add_req...

mlops

2 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hey There, Is There A Way To Access The Trains Configuration Programmatically At Runtime In A Task (The Configuration That Is Dumped By The Agent In The Logs Before Executing A Task)

Hey there, is there a way to access the trains configuration programmatically at runtime in a task (the configuration that is dumped by the agent in the logs...

mlops

5 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hey, Would It Possible To Add An Option To Make

Hey, would it possible to add an option to make task.upload_artifact() blocking? (Not running in background)

clearml

5 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hi, On Clearml-Server 1.5.0, In Scalar Graphs, The New Default Value Is “Show Closest Data On Hover”. Would It Be Possible To Make It Automatically Set To “Compare Data On Hover” When Comparing Multiple Experiments?

Hi, on clearml-server 1.5.0, in scalar graphs, the new default value is “Show closest data on hover”. Would it be possible to make it automatically set to “C...

clearml

3 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi Guys, Since I Am Done With Implementing The Aws Autoscaler, I Would Like To Share Some Pain Points That I Encountered In The Process With The Hope That They Can Be Documented To Help Other Users:

Hi guys, since I am done with implementing the AWS autoscaler, I would like to share some pain points that I encountered in the process with the hope that th...

aws

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, I Encounter The Following Bug With Clearml 0.17.5Rc2: When I Start A Task Locally And That Task Raises Cuda Out Of Memory, The Command Returns But The Process Is Not Killed, And Therefore The Gpu Ram Is Not Freed

Hi, I encounter the following bug with clearml 0.17.5rc2: When I start a task locally and that task raises cuda out of memory, the command returns but the pr...

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hey Again

Hey again 😁 Is it possible to run multiple agents on the same machine? And with some in services mode?

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi Quick Question: Does Task.Connect_Configuration Support Omegaconf Dictconfig Objects? Ie. Can I Do:

Hi quick question: does Task.connect_configuration support OmegaConf DictConfig objects? ie. Can I do: config = train_task.connect_configuration(OmegaConf.lo...

clearml

3 years ago

0 Votes

13 Answers

3K Views

0 Votes 13 Answers 3K Views

Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

Hi, I am trying to use the clearml-agent in docker mode to run an experiment, but it seems to fail passing the clearml.conf file to the docker container: Exe...

clearml

2 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

Hi, another bug to report with the aws_auto_scaler using 1.1.2: Traceback (most recent call last): File "aws_autoscaler.py", line 297, in main() File "aws_au...

mlops

4 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

Hi, kudos for the 0.15 guys! I am having an issue related to git auth: I have an issue with trains-agent (0.15): it does not use git creds while trying to cl...

mlops

5 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, Is There A Way To Get Some Stats About The Use Of Workers? I Would Like To Know, Over The Past 3 Months:

Hi, is there a way to get some stats about the use of workers? I would like to know, over the past 3 months: Number of training hours per user Number of trai...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hello There, Is There A Parameter To Configure The Number Of Columns Rendered In The Preview Area Of The Csv Artifacts? (Some Of Them Are Truncated With “…”)

Hello there, is there a parameter to configure the number of columns rendered in the preview area of the CSV artifacts? (some of them are truncated with “…”)

clearml

4 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hey Guys, I Am Setting Up A New Machine With Two Rtx 3070 Gpus Where I Created Two Agents (One For Each Gpu). On Both Agents, My Experiments Fail With Error:

Hey guys, I am setting up a new machine with two rtx 3070 GPUs where I created two agents (one for each GPU). On both agents, my experiments fail with error:...

pytorch

4 years ago

0 Votes

15 Answers

2K Views

0 Votes 15 Answers 2K Views

Hi, How Can I Get The Logs From The Pytorch Ignite Early Stopping Handler To Be Logged In Clearml?

Hi, how can I get the logs from the pytorch ignite early stopping handler to be logged in clearml?

pytorch

4 years ago

Show more results

0 Hi There, I Have A Bit Of A Problem With Aws Secrets: I Pass Keys As Env Var To Clearml-Agents To Retrieve Data From A Bucket In Us-East-1 But I Use A Bucket To Store Task Artifacts In A Bucket In Eu-Central-1. So When I Pass Aws Keys As Env Vars, The Tas

Yes, I stayed with an older version for a compatibility reason I cannot remember now 😄 - just tested with 1.1.2 and it’s the same
I tried specifying the bucket directly in my clearml.conf, same problem. I guess clearml just reads from the env vars first

4 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

Yes, thanks! In my case, I was actually using TrainsSaver from pytorch-ignite with a local path, then I understood looking at the code that under the hood it actually changed the output_uri of the current task, thats why my previous_task.output_uri = " s3://my_bucket " had no effect (it was placed BEFORE the training)

5 years ago

0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

I finally found a workaround using cache, will detail the solution in the issue 👍

5 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

Oops, I spoke to fast, the json is actually not saved in s3

5 years ago

0 Hi, In The Context Of Multi-Gpu Training, Is

if I want to resume a training on multi gpu, I will need to call this function on each process to send the weights to each gpu

3 years ago

0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

AgitatedDove14 Should I create an issue for this to keep track of it?

4 years ago

0 Hi There, I Am Trying To Setup Clearml To Use Uv As I Am Switching From Pip To Uv. I Am Now Blocked By The Following Issue: Clearml-Agent Won'T Pass The Args Registered When Creating The Experiment To The Task When Running It Remotely. I Do Something Like

Ho, actually this was raised already here

3 months ago

0 Hi, How Can I Get The Logs From The Pytorch Ignite Early Stopping Handler To Be Logged In Clearml?

AgitatedDove14 So in the https://pytorch.org/ignite/_modules/ignite/handlers/early_stopping.html#EarlyStopping class I see that some infos are logged (in the __call__ function), and I would like to have these infos logged by clearml

4 years ago

So last version of the agent working for me is 1.9.3

3 months ago

0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

Try to spin up the instance of that type manually in that region to see if it is available

4 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

So most likely trains was masking the original error, it might be worth investigating to help other users in the future

5 years ago

0 Hey, I Have A Problem With The Following Task:

Thanks for the explanations,
Yes that was the case This is also what I would think, although I double checked yesterday:I create a task on my local machine with trains 0.16.2rc0 This task calls task.execute_remotely() The task is sent to an agent running with 0.16 The agent install trains 0.16.2rc0 The agent runs the task, clones it and enqueues the cloned task The cloned task fails because it has no hyper-parameters/args section (I can seen that in the UI) When I clone the task manually usin...

5 years ago

0 Hi, Together With

using trains RC, trains-agent 0.15.0

5 years ago

0 Hi There, I Used

The task object

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

ha sorry it’s actually the number of shards that increased

4 years ago

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hi @<1523701205467926528:profile|AgitatedDove14> , I want to circule back on this issue. This is still relevant and I could collect the following on an ec2 instance running a clearml-agent running a stuck task:

There seems to be a problem with multiprocessing: Although I stopped the task, there are still so many processes forked from the main training process. I guess these are zombies. Please check the htop tree.
There is a memory leak somewhere, please see the screenshot of datadog mem...

2 years ago

0 Hi, Although

Does that mean that agents do not read this parameter?

4 years ago

0 Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

Should I open an issue in github clearml-agent repo?

2 years ago

0 Hey, What Is The Exact Difference Between

I hitted enter too fast ^^
Installing them globally via
$ pip install numpy opencv torch will install locally with warning:
Defaulting to user installation because normal site-packages is not writeable , therefore the installation will take place in ~/.local/lib/python3.6/site-packages , instead of the default one. Will this still be considered as global site-packages and still be included in experiments envs? From what I tested it does

5 years ago

0 Hi There, I Have A Problem With Pyjwt: I Am Using

You already fixed the problem with pyjwt in the newest version of clearml/clearml-agents, so all good 😄

4 years ago

0 Hi, If I Am Starting My Training With The Following Command:

ok, so even if that guy is attached, it doesn’t report the scalars

3 years ago

0 Is There A Way To Report A Simple Series With X And Y Coords, X And Y Being Two Lists Of Same Length?

Nevermind, I just saw report_matplotlib_figure 🎉

4 years ago

0 Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

4 years ago

0 Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

4 years ago

0 Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

Here is the console with some errors

4 years ago

0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

yes, in the code, i do:
task._wait_for_repo_detection() REQS_TASK = ["torch==1.3.1", "pytorch-ignite @ git+ ", "."] task._update_requirements(REQS_TASK) task.execute_remotely(queue_name=args.queue, clone=False, exit_process=True)

4 years ago

0 Hi There

Please wait a few mins - This example is not valid, I will share a new one soon

5 years ago

0 Hi Folks, Is It Possible To Use An Aws P3 Instance (Which As Several Gpus) With One Agent Per Gpu, All Controlled Through Clearml Aws Autoscheduler? So Clearml Aws Autoscheduler Would Know In Advance How Much Agents To Start In The Instances (Can Be An Op

Notice the last line should not have

--docker

Did you meant --detached ?

I also think we need to make sure we monitor all agents (this is important as this is the trigger to spin down the instance)

That's what I though yea, no problem, it was rather a question, if I encounter the need for that, I will adapt and open a PR 🙂

4 years ago

0 Hi, Although

so the task they execute must have clearml installed?

4 years ago

0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

AgitatedDove14 The first time it installs and create the cache for the env, the second time it fails with:
Applying uncommitted changes ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found. clearml_agent: ERROR: Command '['/home/user/.clearml/venvs-builds.1/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqsmncaxx45.txt']' returned non-zero exit status 1.

4 years ago

Show more results