JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

979 × Eureka!

Questions 214
Answers 1021

0 Votes

3 Answers

988 Views

0 Votes 3 Answers 988 Views

Hi, In The Context Of Multi-Gpu Training, Is

Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others

clearml

3 years ago

0 Votes

16 Answers

1K Views

0 Votes 16 Answers 1K Views

Got Some Errors While Running Migration Script From Es5 To Es7:

Got some errors while running migration script from ES5 to ES7: 2020-08-11 15:21:50,130 Running on: Linux 2020-08-11 15:21:50,227 Docker allocated memory: 16...

clearml

4 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hi, In The Clearml-Server Web-Ui, Under Debug Sample, Would It Be Possible To Improve The Logic For Fetching The Images? If I Have Say 200 Iteration, It Will The Last By Default. If I Want To See Iteration 50, I Will Have To Manually Click On The Arrow Un

Hi, in the clearml-server web-ui, under DEBUG SAMPLE, would it be possible to improve the logic for fetching the images? If I have say 200 iteration, it will...

clearml

one year ago

0 Votes

2 Answers

902 Views

0 Votes 2 Answers 902 Views

Hi, In The Metric Snapshot Graph, Is It Possible To Scale The Y Axis To

Hi, in the Metric Snapshot graph, is it possible to scale the Y axis to [y_min *0.9, y_max * 1,1] ? currently all my values are flat at the same ~y and it is...

clearml

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi, There Is A "Bug" Introduced In The Latest Version Of Clearml-Server: When An Experiment Is In "Full Screen View", In The Console Tab, The Auto Refreshing Of The Console Makes The Console Disappearing For A Short Moment. When The Console Reappears, The

Hi, there is a "bug" introduced in the latest version of clearml-server: when an experiment is in "full screen view", in the console tab, the auto refreshing...

clearml

3 years ago

0 Votes

11 Answers

1K Views

0 Votes 11 Answers 1K Views

Are The Various Task Types Available In 0.15? I Am Getting

Are the various task types available in 0.15? I am getting > 2020-06-09 12:58:53,287 - trains.Task - WARNING - Retrying, previous request failed : 'custom' i...

clearml

4 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hey There, Would It Be Possible To Make Clearml-Agents Support Both Docker Mode And Venv Mode At The Same Time? Ie. Not Requiring To Be Restarted To Switch The Mode. The Mode Should Be Define On The Task Level: I Start An Experiment And Define Whether It

Hey there, Would it be possible to make clearml-agents support both docker mode and venv mode at the same time? Ie. not requiring to be restarted to switch t...

clearml

2 years ago

0 Votes

18 Answers

959 Views

0 Votes 18 Answers 959 Views

Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

Hi Guys, I had several times now the following errors poping in agents while executing a task: trains_agent: ERROR: Failed applying git diff: I attached the ...

clearml

4 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

Hi, is it possible to pass environment variables to agents created by the AWS AutoScaler service?

clearml

3 years ago

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

Hi, I Have A Question Regarding The Aws_Autoscaler: It Usually Takes ~Hours To Get A Gpu Instance Nowadays. I Was Thinking, It Would Be Much More Interesting To Stop The Instances (Clearml-Agents) Instead Of Terminating Them Once They Are Inactive, So Tha

Hi, I have a question regarding the aws_autoscaler: It usually takes ~hours to get a GPU instance nowadays. I was thinking, it would be much more interesting...

mlops

2 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

Hi guys, is a Task updating its status to 'Complete' before finishing to upload its artifacts/metrics in the background?

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hello, Pytorch 1.8 Was Released, Bringing Amd Wheels With It > Pip Install Torch -F

Hello, Pytorch 1.8 was released, bringing AMD wheels with it > pip install torch -f https://download.pytorch.org/whl/rocm4.0.1/torch_stable.html Is ClearML s...

clearml

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

Hi, I face a strange behavior from the clearml-agent: it’s running in services mode, not in docker mode, cpu only. I want to execute two tasks on this servic...

mlops

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hello There! I Have A Question Regarding The Web Ui, On The Project Page: I Have The Following Use Case: I Need To Add Two Custom Columns, Each Reporting One Metric. Currently, This Shows Me The Best (Min/Max) Values Reached By The Model, But Not Necessar

Hello there! I have a question regarding the Web UI, on the project page: I have the following use case: I need to add two custom columns, each reporting one...

clearml

2 years ago

0 Votes

12 Answers

1K Views

0 Votes 12 Answers 1K Views

Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hey there, since a bit I often find experiments being stuck while training a model. It seems to happen randomly and I could not find a reproducible scenario ...

mlops

2 years ago

0 Votes

3 Answers

973 Views

0 Votes 3 Answers 973 Views

Hi Guys, Since I Am Done With Implementing The Aws Autoscaler, I Would Like To Share Some Pain Points That I Encountered In The Process With The Hope That They Can Be Documented To Help Other Users:

Hi guys, since I am done with implementing the AWS autoscaler, I would like to share some pain points that I encountered in the process with the hope that th...

aws

3 years ago

0 Votes

3 Answers

968 Views

0 Votes 3 Answers 968 Views

Hey! Would It Be Possible To Tag The Rc Releases In The Different Repos? So That One Knows What Is Inside?

Hey! Would it be possible to tag the RC releases in the different repos? So that one knows what is inside?

clearml

4 years ago

0 Votes

4 Answers

958 Views

0 Votes 4 Answers 958 Views

Hey There, Happy New Year To All Of You

Hey there, happy new year to all of you 🍾 I have several tasks that are stuck while training a model with pytorch/ignite, more precisely right after uploadi...

mlops

3 years ago

0 Votes

1 Answers

933 Views

0 Votes 1 Answers 933 Views

Small Error In Doc:

Small error in doc: https://allegro.ai/docs/references/trains_agent_ref/#daemon The detach parameter is shown in the command as --detached while it is listed...

clearml

4 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hi, On Clearml-Server 1.5.0, In Scalar Graphs, The New Default Value Is “Show Closest Data On Hover”. Would It Be Possible To Make It Automatically Set To “Compare Data On Hover” When Comparing Multiple Experiments?

Hi, on clearml-server 1.5.0, in scalar graphs, the new default value is “Show closest data on hover”. Would it be possible to make it automatically set to “C...

clearml

2 years ago

0 Votes

1 Answers

993 Views

0 Votes 1 Answers 993 Views

Hi, I Encounter The Following Bug With Clearml 0.17.5Rc2: When I Start A Task Locally And That Task Raises Cuda Out Of Memory, The Command Returns But The Process Is Not Killed, And Therefore The Gpu Ram Is Not Freed

Hi, I encounter the following bug with clearml 0.17.5rc2: When I start a task locally and that task raises cuda out of memory, the command returns but the pr...

clearml

3 years ago

0 Votes

11 Answers

975 Views

0 Votes 11 Answers 975 Views

Hi Guys, Following Up On This

Hi guys, following up on this https://allegroai-trains.slack.com/archives/CTK20V944/p1599135173096200?thread_ts=1599125260.076600&cid=CTK20V944 : I have a pi...

clearml

4 years ago

0 Votes

14 Answers

1K Views

0 Votes 14 Answers 1K Views

Hi, When I Use Task.Get_Logger().Report_Table, I Go The Ui After The Experiment Finishes And I Download The Table (Under Results > Plots), It Gives Me A Json File. How Can I Use It? It Seems To Follow A Structure Specific To Clearml, How Can I For Example

Hi, when I use task.get_logger().report_table, I go the UI after the experiment finishes and I download the table (under RESULTS > PLOTS), it gives me a json...

clearml

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Btw I Saw A Bug In The Web Ui That Is Rather Frustrating: When I Add Some Metric Columns To A Project Page, If I Refresh The Page Manually With F5, All The Changes I Made On The Columns Are Rolled-Back, As If They Were Not Saved. Same Happens With The Res

Btw I saw a bug in the web UI that is rather frustrating: When I add some metric columns to a project page, if I refresh the page manually with F5, all the c...

clearml

2 years ago

0 Votes

26 Answers

1K Views

0 Votes 26 Answers 1K Views

Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

Hi, I attached an IAM role to an ec2 instance to grant access to an s3 bucket. The ec2 instance is running a clearml-agent (v1.1.0). I didn’t specify any key...

aws

3 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

Hi, another bug to report with the aws_auto_scaler using 1.1.2: Traceback (most recent call last): File "aws_autoscaler.py", line 297, in main() File "aws_au...

mlops

3 years ago

0 Votes

1 Answers

910 Views

0 Votes 1 Answers 910 Views

Hi There, Would It Be Possible To Add Some Neural Architecture Search Example, As For The Hyperparameter Optimizer Examples?

Hi there, would it be possible to add some Neural Architecture Search example, as for the HyperParameter Optimizer examples?

clearml

3 years ago

0 Votes

4 Answers

992 Views

0 Votes 4 Answers 992 Views

Hi, What Happens Exactly When I Execute The Following Command:

Hi, what happens exactly when I execute the following command: trains-agent daemon --gpus 0 --queue default &In my code, how to know which GPU to choose insi...

clearml

4 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, I Am Using The Aws Autoscaler And Getting The Following Error While Trying To Spin Up Spot Instances:

Hi, I am using the aws autoscaler and getting the following error while trying to spin up spot instances: 2021-08-16 17:18:48 Spinning new instance type=v100...

aws mlops

3 years ago

0 Votes

1 Answers

953 Views

0 Votes 1 Answers 953 Views

The Markdown Editor For Documenting Projects Is So Great, I Love It

The Markdown editor for documenting projects is so great, I love it 🤩

clearml

3 years ago

Show more results

0 Hey There, Is There A Way To Access The Trains Configuration Programmatically At Runtime In A Task (The Configuration That Is Dumped By The Agent In The Logs Before Executing A Task)

Awesome, thanks WackyRabbit7 , AgitatedDove14 !

4 years ago

0 Hi, Is It Possible To Specify The Required Version Of Python For A Task That Is Different From The Python Running The Clearml-Agent? Example: My Clearml-Agent Is Running On Python 3.8 And I Need A Task To Run On Python 3.10. How Can I Do That?

Ok thanks! And for this?

Would it be possible to support such use case? (have the clearml-agent setting-up a different python version when a task needs it?)

2 years ago

0 Hi, I Am Using Clearml With Pytorch-Ignite And Its Earlystopping Handler. I Would Like To Log The Counter Of The Patience Of This Handler, How Can I Do That?

I didn’t use ignite callbacks, for future reference:
` early_stopping_handler = EarlyStopping(...)

def log_patience(_):
clearml_logger.report_scalar("patience", "early_stopping", early_stopping_handler.counter, engine.state.epoch)

engine.add_event_handler(Events.EPOCH_COMPLETED, early_stopping_handler)
engine.add_event_handler(Events.EPOCH_COMPLETED, log_patience) `

3 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

What do you mean by aws scalar?

3 years ago

As you can see, more hard waiting (initial sleep), and then before each apt action, make sure there is no lock

3 years ago

Here is the log from the instance

3 years ago

doesn’t really work unfortunately

3 years ago

the instances takes so much time to start, like 5 mins

3 years ago

on a p3.2xlarge instance

3 years ago

0 Hi, Is It Possible To Get An Artifact From A Task And Force Not Using Local Cache? The Task Itself Updated The Artifact In The Meantime And I Cannot Get The Latest Version Of The Artifact. I Saw That

I sneaked a PR > https://github.com/allegroai/clearml/pull/319

3 years ago

edited the aws_auto_scaler.py, actually I think it’s just a typo, I just need to double the brackets

3 years ago

0 Hi Again, It Seems Like The Aws Autoscaler Is Not Spinning Instances With The Ebs Configuration I Configured. Here Is The Configuration:

Interestingly, I do see the 100gb volume in the aws console:

3 years ago

0 Hi Guys, I Would Like To Start Using The Aws Autoscaler Shipped In Trains. I Need To Create A Iam User To Get And I Would Like To Know What Are The Minimal Permissions Required For The Autoscaler To Work?

Hey FriendlySquid61 ,
I ended up asking for full control of EC2 not to be blocked, so unfortunately I cannot give you a more precise list 😕

3 years ago

0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

Try to spin up the instance of that type manually in that region to see if it is available

3 years ago

0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

I did that recently - what are you trying to do exactly?

3 years ago

0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

did you try with another availability zone?

3 years ago

0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

ok, what is your problem then?

3 years ago

0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

RobustRat47 It can also simply be that the instance type you declared is not available in the zone you defined

3 years ago

0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

I would probably leave it to the ClearML team to answer you, I am not using the UI app and for me it worked just well with different regions. Maybe check permissions of the key/secrets?

3 years ago

0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

Could you please share the stacktrace?

3 years ago

0 Hi, I Deleted All Archived Experiments In A Project And I Just Realized All Experiments Of All Projects Were Deleted (Clearml Server V1.0.0)

And now that I restarted the server and went back into the project where I initially deleted the archived experiments, some of them are still there - I will leave them alone, too scared to do anything now 😄

3 years ago

0 Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

3 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

It seems that around here, a Task that is created using init remotely in the main process gets its output_uri parameter ignored

4 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

even if I explicitely use previous_task.output_uri = " s3://my_bucket " , it is ignored and still saves the json file locally

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

I killed both trains-agent and restarted one to have a clean start. This way it correctly spin up docker containers for services tasks. So probably the bug comes when a bug occurs while setting up a task, it cannot go back to the main task. I would need to do some tests to validate that hypothesis though

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

The task with id a445e40b53c5417da1a6489aad616fee is not aborted and is still running

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

of which task/process?

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

So the controller task finished and now only the second trains-agent services mode process is showing up as registered. So this is definitly something linked to the switching back to the main process.

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

I will try to isolate the bug, if I can, I will open an issue in trains-agent 🙂

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

-> seems to run properly now

4 years ago

Show more results