JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, I Deleted All Archived Experiments In A Project And I Just Realized All Experiments Of All Projects Were Deleted (Clearml Server V1.0.0)

Hi, I deleted all archived experiments in a project and I just realized all experiments of all projects were deleted (clearml server v1.0.0) 🤔

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi There, I Have Several Experiments Hanging/Stuck In The Middle Or At The End Of The Training, With The Last Message Logged Being:

Hi there, I have several experiments hanging/stuck in the middle or at the end of the training, with the last message logged being: train INFO: Engine run co...

clearml

one year ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

Hello, I would like to use spot instances together with the AWS autoscaler to train models with pytorch/ignite and I am wondering how to support interruption...

mlops

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Quick Question: Why Does Clearml-Server 1.15.0 Api-Server Python Package Require Es 8.12.0 But The Docker-Compose References Es 7.17.18?

Quick question: Why does clearml-server 1.15.0 api-server python package require ES 8.12.0 but the docker-compose references ES 7.17.18?

clearml

one year ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hi, I Just Updated Clearml Server 1.0 Using

Hi, I just updated clearml server 1.0 using docker-compose down & docker-compose pull & docker-compose up -d , it worked ant it looks amazing! I found two pr...

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi There

Hi there 🙂 Task.get_parameters() returns an empty dict from within a trains-agent task being executed. When I execute it outside, it works properly. Is it i...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Where Can I Find The Logs Of Trains-Agent By Default?

Hi, where can I find the logs of trains-agent by default?

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

Hi, in a subproject, would it be possible to hide the parent project if it is empty?

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is It Possible To Start A Clearml-Agent (Not In Docker Mode) On A Machine With A Gpu, But Enforce The Clearml-Agent To Not “See” The Gpu? So That The Experiments Run By This Agent Fail If They Try To Access A Gpu? Like The

Hi, is it possible to start a clearml-agent (not in docker mode) on a machine with a gpu, but enforce the clearml-agent to not “see” the gpu? So that the exp...

mlops

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, Another Idea For Clearml Web Ui: In The Projects View, If I Have Several Experiments Being Enqueued And I Sort By “Started” Ascending (Newest On Top), I Expect To See Enqueued Experiments At The Very Top, While They Are Shown At The Very Bottom - Woul

Hi, another idea for ClearML web UI: in the projects view, if I have several experiments being enqueued and I sort by “Started” ascending (newest on top), I ...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

How Can I Do The Following? (Basically, Filtering By Task Type)

How can I do the following? (basically, filtering by task type) Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(type="trainin...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

First Link In Hyperparameter Optimization Page Is Broken >

First link in hyperparameter optimization page is broken > https://allegro.ai/docs/examples/examples_hyperparam_opt/

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Looks Like Trains-Agent 0.16

Looks like trains-agent 0.16 doesn't support --install-globally documented parameter -> Only available for trains-agent build command. Would it be possible t...

clearml

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, I Would Like To Use Pytorch3D==0.5.0 With Torch==1.9.1 On Cuda Version 110, Locally It Works, But The Clearml Agent Fails Setting Up The Environment With The Following Error:

Hi, I would like to use pytorch3d==0.5.0 with torch==1.9.1 on cuda version 110, locally it works, but the clearml agent fails setting up the environment with...

mlops

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, There Is A "Bug" Introduced In The Latest Version Of Clearml-Server: When An Experiment Is In "Full Screen View", In The Console Tab, The Auto Refreshing Of The Console Makes The Console Disappearing For A Short Moment. When The Console Reappears, The

Hi, there is a "bug" introduced in the latest version of clearml-server: when an experiment is in "full screen view", in the console tab, the auto refreshing...

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

Hi, in one of my agents with CUDA Version: 11.1 (from nvidia-smi), clearml agent 0.17.1 detects version 100 (I can see from experiments logs: agent.cuda_vers...

mlops

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hey, Just Wanted To Mention: In Docs, Task.Get_Parameter Does Not Say:

Hey, just wanted to mention: in docs, Task.get_parameter does not say: Different sections with key prefix "section/" , as Task.get_parameters do. Also there ...

clearml

5 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hi, Where Can I Find The Server Parameter To Control When The Server Is Unregistering An Agent After Not Receiving Updates? Currently It'S Quite Long (30Mins) And This Prevents The Autoscaler From Launching A New Agent

Hi, where can I find the server parameter to control when the server is unregistering an agent after not receiving updates? Currently it's quite long (30mins...

mlops

2 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hey There, I See That In The Autoscaler Configuration, The

Hey there, I see that in the autoscaler configuration, the queues param accept dictionaries with values of type list of lists (see eg below.) What does it me...

mlops

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is It Still True That --Services-Mode Only Supports Docker Mode?

Hi, Is it still true that --services-mode only supports docker mode?

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

Hi, is it possible to pass environment variables to agents created by the AWS AutoScaler service?

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

Hi, I just updated clearml-server to 1.1.0 and got the following error when starting it with docker-compose: clearml-apiserver | [2021-08-02 13:37:09,852] [8...

clearml

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, Is There A Way To Get Some Stats About The Use Of Workers? I Would Like To Know, Over The Past 3 Months:

Hi, is there a way to get some stats about the use of workers? I would like to know, over the past 3 months: Number of training hours per user Number of trai...

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

Could you please explain a bit more how trains adapt the torch version depending on the installed cuda version? Here is my setup: cuda 102 installed and corr...

clearml

5 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, Would It Be Possible To Parse Torch Requirement When It’S Part Of The Extras_Require Dict? In My Code, I Have The Following:

Hi, would it be possible to parse torch requirement when it’s part of the extras_require dict? In my code, I have the following: train_task._update_requireme...

mlops

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, I Have A Long Running Experiment That Was Running On Aws Instance That Got Killed After ~4 Days With The Following Reason:

Hi, I have a long running experiment that was running on AWS instance that got killed after ~4 days with the following reason: STATUS REASON: Forced stop (no...

clearml

3 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

Hi guys, I got a very unexpected error today on in one of my agents: ... Collecting tqdm Using cached tqdm-4.48.2-py2.py3-none-any.whl (68 kB) Processing /ro...

clearml

5 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Is There A Way To Report A Simple Series With X And Y Coords, X And Y Being Two Lists Of Same Length?

Is there a way to report a simple series with X and Y coords, X and Y being two lists of same length?

clearml

4 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

Hello, ~3 months ago I created a trains-server in a machine with 30gb of disk space. Today I wasn't able to connect to trains-server, so I checked the server...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hey! Would It Be Possible To Tag The Rc Releases In The Different Repos? So That One Knows What Is Inside?

Hey! Would it be possible to tag the RC releases in the different repos? So that one knows what is inside?

clearml

5 years ago

Show more results

0 Hey, Clearml Team! When Can We Expect An Updated Roadmap? Last One Is From August

automatically promote models to be served from within clearml

Yes!

4 years ago

0 I Guess One Experiment Is Running Backwards In Time

I hit F12 to check projects.get_all_ex but nothing is fired, I guess the web ui is just frozen in some weird state

3 years ago

0 Hi, Is It Possible To Disable Some Of The System Metrics Monitored? And Also Downsample The Rate Of Logging?

Nice, thanks!

4 years ago

0 Hi, I Have A Question Regarding The Aws-Autoscaler: Am I Understanding Correctly That:

I will try with that and keep you updated

4 years ago

0 Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

SuccessfulKoala55 I found the issue thanks to you: I changed a bit the domain but didn’t update the apiserver.auth.cookies.domain setting - I did it, restarted and now it works 🙂 Thanks!

4 years ago

0 Hey, Clearml Team! When Can We Expect An Updated Roadmap? Last One Is From August

That would be amazing!

4 years ago

0 Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

Oof now I cannot start the second controller in the services queue on the same second machine, it fails with
` Processing /tmp/build/80754af9/cffi_1605538068321/work
ERROR: Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/tmp/build/80754af9/cffi_1605538068321/work'
clearml_agent: ERROR: Could not install task requirements!
Command '['/home/machine/.clearml/venvs-builds.1.3/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r'...

4 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

AgitatedDove14 I see https://github.com/allegroai/clearml-session/blob/main/clearml_session/interactive_session_task.py#L21= that a key pair is hardcoded in the repo. Is it being used to ssh to the instance?

3 years ago

0 Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

I have a mental model of the clearml-agent as a module to spin my code somewhere, and the python version running my code should not depend of the python version running the clearml-agent (especially for experiments running in containers)

2 years ago

0 Hi, Some Properties Of The Task Object Are Not Listed In The Documentation (Such As Task.Parent, Which Is Not Clear Whether It Is The Parent Task Object Itself Or The Id Of The Parent Task).

Yes, actually thats what I am doing, because I have a task C depending on tasks A and B. Since a Task cannot have two parents, I retrieve one task id (task A) as the parent id and the other one (ID of task B) as a hyper-parameter, as you described 👍

5 years ago

0 Hi There, I Have A Problem With Pyjwt: I Am Using

but the post_packages does not reinstalls the version 1.7.1

4 years ago

0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

In execution tab, I see old commit, in logs, I see an empty branch and the old commit

5 years ago

And since I ran the task locally with python3.9, it used that version in the docker container

2 years ago

0 Hey There, Happy New Year To All Of You

Hi AgitatedDove14 , so I ran 3 experiments:
One with my current implementation (using "fork") One using "forkserver" One using "forkserver" + the DataLoader optimizationI sent you the results via MP, here are the outcomes:
fork -> 101 mins, low RAM usage (5Go constant), almost no IO forkserver -> 123 mins, high RAM usage (16Go, fluctuations), high IO forkserver + DataLoader optimization: 105 mins, high RAM usage (from 28Go to 16Go), high IO
CPU/GPU curves are the same for the 3 experiments...

4 years ago

0 Hi, If I Am Starting My Training With The Following Command:

I opened an https://github.com/pytorch/ignite/issues/2343 in ignite’s repo and a https://github.com/pytorch/ignite/pull/2344 , could you please have a look? There might be a bug in clearml Task.init in distributed envs

3 years ago

0 Hi, I Am Using Clearml With Pytorch-Ignite And Its Earlystopping Handler. I Would Like To Log The Counter Of The Patience Of This Handler, How Can I Do That?

I managed to do it by using logger.report_scalar, thanks!

4 years ago

0 Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

/data/shared/miniconda3/bin/python /data/shared/miniconda3/bin/clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only

4 years ago

0 Hi, If I Am Starting My Training With The Following Command:

And I am wondering if only the main process (rank=0) should attach the ClearMLLogger or if all the processes within the node should do that

3 years ago

0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

Now I am trying to restart the cluster with docker-compose and specifying the last volume, how can I do that?

4 years ago

0 Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

I mean, inside a parent, do not show the project [parent] if there is nothing inside

3 years ago

0 Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

line 13 is empty 🤔

4 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

I now have a different question: when installing torch from wheels files, I am guaranteed to have the corresponding cuda library and cudnn together right?

5 years ago

0 Hi, How Can I Get The Logs From The Pytorch Ignite Early Stopping Handler To Be Logged In Clearml?

AgitatedDove14 So in the https://pytorch.org/ignite/_modules/ignite/handlers/early_stopping.html#EarlyStopping class I see that some infos are logged (in the __call__ function), and I would like to have these infos logged by clearml

4 years ago

0 Hey There! I Would Like To Use The Function

that’s perfect, thanks!

3 years ago

0 Hey Guys, Quick Question: Is There A Tool Function To Know If A Task Id Is Valid? Not Verifying That The Task Itself Exists, Just That The Task Id Is The Correct Format

Thanks SuccessfulKoala55 😁

5 years ago

0 Hi, If I Am Starting My Training With The Following Command:

AgitatedDove14 If I call explicitly task.get_logger().report_scalar("test", str(parse_args.local_rank), 1., 0) , this will log as expected one value per process, so reporting works

3 years ago

0 Hi, I Have Several Long Running Experiments Failing With

AgitatedDove14 After investigation, another program on the machine consumed all the memory available, most likely making the OS killing the agent/task

4 years ago

0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

my docker-compose for the master node of the ES cluster is the following:
` version: "3.6"
services:

elasticsearch:
container_name: clearml-elastic
environment:
ES_JAVA_OPTS: -Xms2g -Xmx2g
bootstrap.memory_lock: "true"
cluster.name: clearml-es
cluster.initial_master_nodes: clearml-es-n1, clearml-es-n2, clearml-es-n3
cluster.routing.allocation.node_initial_primaries_recoveries: "500"
cluster.routing.allocation.disk.watermark.low: 500mb
clust...

4 years ago

0 Hi

Very good job! One note: in this version of the web-server, the experiments logo types are all blank, what was the reason to change them? Having a color code in the logos helps a lot to quickly check the nature of the different experiments tasks, isnt it?

5 years ago

0 Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

Yes 🙂 Thanks!

5 years ago

Show more results