JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, I Am Using Clearml With Pytorch-Ignite And Its Earlystopping Handler. I Would Like To Log The Counter Of The Patience Of This Handler, How Can I Do That?

Hi, I am using clearml with pytorch-ignite and its EarlyStopping handler. I would like to log the counter of the patience of this handler, how can I do that?

clearml

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi There, Any Plan/Benefit To Support Virtualenv= 20 ?

Hi there, any plan/benefit to support virtualenv= 20 ?

clearml

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

How Can I Do The Following? (Basically, Filtering By Task Type)

How can I do the following? (basically, filtering by task type) Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(type="trainin...

clearml

5 years ago

0 Votes

30 Answers

3K Views

0 Votes 30 Answers 3K Views

Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

Hi, I am giving another try to clearml-session and I am blocked at the current error shown when the CLI try to establish the tunneling: Starting SSH tunnel W...

remote-ssh

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, I Would Like To Use Pytorch3D==0.5.0 With Torch==1.9.1 On Cuda Version 110, Locally It Works, But The Clearml Agent Fails Setting Up The Environment With The Following Error:

Hi, I would like to use pytorch3d==0.5.0 with torch==1.9.1 on cuda version 110, locally it works, but the clearml agent fails setting up the environment with...

mlops

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, I Recently Updated Clearml-Server To 1.7 And I Am Getting A Lot Of The Following Errors Since Today On Any Experiment (I Didn'T Had This Error Before):

Hi, I recently updated clearml-server to 1.7 and I am getting a lot of the following errors since today on any experiment (I didn't had this error before): 1...

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Is There An Option To Make Trains-Agent Create Experiment Virtualenvs With

Is there an option to make trains-agent create experiment virtualenvs with --system-site-packages parameter?

clearml

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hello, Is It Possible For The Clearml-Agent In Docker Mode To Not Pull A Specific Docker Image, But To Build One From The Experiment Repository Using The Dockerfile And .Dockerignore Of The Experiment Repository?

Hello, is it possible for the clearml-agent in docker mode to not pull a specific docker image, but to build one from the experiment repository using the Doc...

clearml

3 years ago

0 Votes

20 Answers

2K Views

0 Votes 20 Answers 2K Views

Is It Possible To Run An Agent, Listen To The Services Queue Without Using Docker?

Is it possible to run an agent, listen to the services queue without using docker?

clearml

5 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hi, I Just Updated Clearml Server 1.0 Using

Hi, I just updated clearml server 1.0 using docker-compose down & docker-compose pull & docker-compose up -d , it worked ant it looks amazing! I found two pr...

clearml

4 years ago

0 Votes

14 Answers

2K Views

0 Votes 14 Answers 2K Views

Hi, When I Use Task.Get_Logger().Report_Table, I Go The Ui After The Experiment Finishes And I Download The Table (Under Results > Plots), It Gives Me A Json File. How Can I Use It? It Seems To Follow A Structure Specific To Clearml, How Can I For Example

Hi, when I use task.get_logger().report_table, I go the UI after the experiment finishes and I download the table (under RESULTS > PLOTS), it gives me a json...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

How Can I Filter Out Archived Tasks With Task.Get_Tasks?

How can I filter out archived tasks with Task.get_tasks?

clearml

4 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Hey, I Have A Problem With The Following Task:

Hey, I have a problem with the following task: def main(args): config = yaml.load(open(args.config)) if __name__ == '__main__': parser = argparse.ArgumentPar...

clearml

5 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

Hi, I just updated clearml-server to 1.1.0 and got the following error when starting it with docker-compose: clearml-apiserver | [2021-08-02 13:37:09,852] [8...

clearml

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, Is There A Way To Get Some Stats About The Use Of Workers? I Would Like To Know, Over The Past 3 Months:

Hi, is there a way to get some stats about the use of workers? I would like to know, over the past 3 months: Number of training hours per user Number of trai...

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hey There, Is There A Way To Access The Trains Configuration Programmatically At Runtime In A Task (The Configuration That Is Dumped By The Agent In The Logs Before Executing A Task)

Hey there, is there a way to access the trains configuration programmatically at runtime in a task (the configuration that is dumped by the agent in the logs...

mlops

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi There, Congrats For Releasing V1

Hi there, congrats for releasing v1 😄 I observed that with pytorch ignite (4.2.0), the metrics of the validation engines are delayed by one epoch. I am not ...

pytorch

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi Guys, Last Night One Of Our Agents (0.16.1) Was Disconnected From Our Trains-Server While Executing An Experiment. I Saw That Because The Experiment It Was Running Had The Status Aborted And I Could Not See The Agent In The List Of Available Workers. H

Hi guys, Last night one of our agents (0.16.1) was disconnected from our trains-server while executing an experiment. I saw that because the experiment it wa...

mlops

5 years ago

0 Votes

17 Answers

2K Views

0 Votes 17 Answers 2K Views

Hi There, I Have A Problem With Pyjwt: I Am Using

Hi there, I have a problem with PyJWT: I am using trains==0.16.4 and trains-agent==0.16.3 in my agents. I installed PyJWT==1.7.1 in the agent (through extra_...

mlops

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hey There, I See That In The Autoscaler Configuration, The

Hey there, I see that in the autoscaler configuration, the queues param accept dictionaries with values of type list of lists (see eg below.) What does it me...

mlops

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hey There, Would It Be Possible To Make Clearml-Agents Support Both Docker Mode And Venv Mode At The Same Time? Ie. Not Requiring To Be Restarted To Switch The Mode. The Mode Should Be Define On The Task Level: I Start An Experiment And Define Whether It

Hey there, Would it be possible to make clearml-agents support both docker mode and venv mode at the same time? Ie. not requiring to be restarted to switch t...

clearml

3 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, I Have A Clearml-Agent (1.1.2) In A G4Dn.4Xlarge Aws Instance (With One T4 Gpu), That Reports

Hi, I have a clearml-agent (1.1.2) in a g4dn.4xlarge AWS instance (with one T4 GPU), that reports agent.cuda_version = 0 agent.cudnn_version = 0and does not ...

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, In The Aws Autoscaler, I Am Getting The Following Warning:

Hi, in the AWS AutoScaler, I am getting the following warning: Warning! exception occurred: APIError: code 400/1004: Worker is not registered: worker=aws:A10...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi Guys, Since I Am Done With Implementing The Aws Autoscaler, I Would Like To Share Some Pain Points That I Encountered In The Process With The Hope That They Can Be Documented To Help Other Users:

Hi guys, since I am done with implementing the AWS autoscaler, I would like to share some pain points that I encountered in the process with the hope that th...

aws

4 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hi, Just Want To Report A Small Bug In The Clearml Dashboard: After Queuing An Experiment, If I Change The Experiment Queue, Then Go Back To The Experiment Info Tab, The Queue Property Still Shows The Previous Queue

Hi, just want to report a small bug in the clearml dashboard: after queuing an experiment, if I change the experiment queue, then go back to the experiment I...

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, Together With

Hi, Together with ElegantKangaroo44 we found two unexpected behaviors in task.models['output'] : The input model of the task is included in the list The best...

clearml

5 years ago

0 Votes

9 Answers

2K Views

0 Votes 9 Answers 2K Views

Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...

clearml

4 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

Hello, ~3 months ago I created a trains-server in a machine with 30gb of disk space. Today I wasn't able to connect to trains-server, so I checked the server...

clearml

4 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

Hi guys, is a Task updating its status to 'Complete' before finishing to upload its artifacts/metrics in the background?

clearml

5 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hey Guys, I Am Setting Up A New Machine With Two Rtx 3070 Gpus Where I Created Two Agents (One For Each Gpu). On Both Agents, My Experiments Fail With Error:

Hey guys, I am setting up a new machine with two rtx 3070 GPUs where I created two agents (one for each GPU). On both agents, my experiments fail with error:...

pytorch

4 years ago

Show more results

0 Hi There,

I think that somehow somewhere a reference to the figure is still living, so plt.close("all") and gc cannot free the figure and it ends up accumulating. I don't know where yet

2 years ago

0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

I also don't understand what you mean by unless the domain is different... The same way ssh keys are global, I would have expected the git creds to be used for any git operation

5 years ago

0 Hi, I Would Like To Bring Awareness

Ha I just saw in the logs:

WARNING:py.warnings:/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/torch/cuda/__init__.py:145: UserWarning:
NVIDIA A10G with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA A10G GPU with PyTorch, please check the instructions at

2 years ago

0 Hi, If I Am Starting My Training With The Following Command:

Hi AgitatedDove14 , I investigated further and got rid of a separate bug. I was able to get ignite’s events fired, but still no scalars logged 😞
There is definitely something wrong going on with the reporting of scalars using multi processes, because if my ignite callback is the following:

` def log_loss(engine):
idist.barrier(). # Sync all processes
device = idist.device()
print("IDIST", device)
from clearml import Task
Task.current_task().get_logger().r...

3 years ago

0 Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

AgitatedDove14 ok, but this happens in my local machine, not in the agent

4 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

So previous_task actually ignored the output_uri

5 years ago

0 Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

No 🙂

4 years ago

0 Hi, When I Use Task.Get_Logger().Report_Table, I Go The Ui After The Experiment Finishes And I Download The Table (Under Results > Plots), It Gives Me A Json File. How Can I Use It? It Seems To Follow A Structure Specific To Clearml, How Can I For Example

cool, thanks AgitatedDove14 !

4 years ago

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Note: Could be related to https://github.com/allegroai/clearml/issues/790 , not sure

3 years ago

0 Hi All, How Can I Have A Global Variable Used In A Pipeline Step? I Have To Define Them In Each Pipeline Step, Otherwise They Are Not Included In The Pipeline Step

yes

one year ago

0 I Guess One Experiment Is Running Backwards In Time

CostlyOstrich36 I don’t see such number, can you please share a screenshot of where to look at?

3 years ago

0 Hi, Where Can I Find The Logs Of Trains-Agent By Default?

Thanks, the message is not logged in GCloud instances logs when using startup scripts, this is why I did not see it. 👍

5 years ago

0 Hi, I Have Another Bug To Report For Clearml-Server 1.2 (Self Hosted) In The Console Logs Of An Experiments, I Cannot See The Latest Logs. Eg My Experiment Is Done, But I Can Only See The Logs Of To The Installation Of The Packages. If I Download The Log

CostlyOstrich36 , actually this only happens for a single agent. The weird thing is that I have a machine with two gpus, and I spawn two agents, one per gpus. Both have the same version. For one, I can see all the logs, but not for the other

3 years ago

0 Hey There, Is It Possible For A Clearml Pipeline Step To Log A Folder Instead Of Numpy/Pickle Objects? Looking At The Docs,

Yes exactly

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

but according to the disks graphs, the OS disk is being used, but not the data disk

4 years ago

0 Hi, I Would Like To Bring Awareness

I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt? (edited)

That sounds great!

2 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

I get the following error:

4 years ago

0 Hey, What Is The Exact Difference Between

Yes, thanks for your answer!

5 years ago

0 Hey, Clearml Team! When Can We Expect An Updated Roadmap? Last One Is From August

AnxiousSeal95 The main reason for me to not use clearml-serving triton is the lack of documentation tbh 😄 I am not sure how to make my pytorch model run there

4 years ago

0 Hi, I Would Like To Switch From The Elastic-Search Service In The Docker-Compose Of The Clearml-Server To An Externally Managed, Scalable Elastic-Search Cluster. I Have Two Questions:

I am not sure I can do both operations at the same time (migration + splitting), do you think it’s better to do splitting first or migration first?

4 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

So that I don’t loose what I worked on when stopping the session, and if I need to, I can ssh to the machine and directly access the content inside the user folder

4 years ago

0 Hey, Clearml Team! When Can We Expect An Updated Roadmap? Last One Is From August

automatically promote models to be served from within clearml

Yes!

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

of which task/process?

5 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

AgitatedDove14 That's a good point: The experiment failing with this error does show the correct aws key:
... sdk.aws.s3.key = ***** sdk.aws.s3.region = ...

5 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

even if I explicitely use previous_task.output_uri = " s3://my_bucket " , it is ignored and still saves the json file locally

5 years ago

0 Hi, Although

Add carriage return flush support using the sdk.development.worker.console_cr_flush_period configuration setting (GitHub trains Issue 181)

4 years ago

0 Hey, I Have A Problem With The Following Task:

Ok, so after updating to trains==0.16.2rc0, my problem is different: when I clone a task, update its script and enqueue it, it does not have any Hyper-parameters/argv section in the UI

5 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

` ssh my-instance
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:O2++ST5lAGVoredT1hqlAyTowgNwlnNRJrwE8cbM...

3 years ago

0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

in my clearml.conf, I only have:
sdk.aws.s3.region = eu-central-1 sdk.aws.s3.use_credentials_chain = true agent.package_manager.pip_version = "==20.2.3"

2 years ago

0 Hi, If I Am Starting My Training With The Following Command:

Also, this is maybe a separate issue but could be linked, if I add Task.current_task().get_logger().flush(wait=True) like this:
` def log_loss(engine):
idist.barrier()
device = idist.device()
print("IDIST", device)
from clearml import Task
Task.current_task().get_logger().report_text(f"{device}, FIRED, {engine.state.iteration}, {engine.state.metrics}")
Task.current_task().get_logger().report_scalar("train", "loss", engine.state.metrics["loss"], engine.state.itera...

3 years ago

Show more results