JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

979 × Eureka!

Questions 214
Answers 1021

0 Votes

13 Answers

1K Views

0 Votes 13 Answers 1K Views

Hi, I Update Recently To Clearml-Server 1.2 (Self Hosted), Great Job! I Am Seeing The Popup Asking For S3 Creds Often When Navigating In Debug Samples. I Set Them Multiple Times Under Settings > Configuration > Web App Cloud Access, But For Some Reason It

Hi, I update recently to clearml-server 1.2 (self hosted), great job! I am seeing the popup asking for s3 creds often when navigating in debug samples. I set...

clearml

2 years ago

0 Votes

17 Answers

1K Views

0 Votes 17 Answers 1K Views

Hi, I Have Another Bug To Report For Clearml-Server 1.2 (Self Hosted) In The Console Logs Of An Experiments, I Cannot See The Latest Logs. Eg My Experiment Is Done, But I Can Only See The Logs Of To The Installation Of The Packages. If I Download The Log

Hi, I have another bug to report for clearml-server 1.2 (self hosted) In the console logs of an experiments, I cannot see the latest logs. Eg my experiment i...

clearml

2 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, I Would Like To Report Another Bug Introduced With Clearml-Server 1.2.0: In The Comparison Page Of Two Experiments, On The Scalar Tab, With The Graph Layout, When Clicking On The Eye On One Scalar Group To Hide The Related Graphs, The Later Do Disappe

Hi, I would like to report another bug introduced with clearml-server 1.2.0: In the comparison page of two experiments, on the scalar tab, with the graph lay...

clearml

2 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hi There, It Seems Like There Is A Bug With The Visualization Of Debug Samples On The Ui (Server V1.2.0, Self-Hosted): When Clicking On A Debug Sample Then On The Download Button, If The Sample Is Stored In S3, The Download Button Opens A Blank Page With

Hi there, it seems like there is a bug with the visualization of debug samples on the UI (server v1.2.0, self-hosted): when clicking on a debug sample then o...

clearml

2 years ago

Show more results

0 Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

3 years ago

0 Hello, I Have A Small Question Regarding Ui: Currently, In The Artifacts Section Of A Task, The

AgitatedDove14 WOW, thanks a lot! I will dig into that 🚀

4 years ago

0 Hi, I Updated To Clearml-Server 1.4.0 And I Am Uncomfortable With The New Table/Detail View, Is There A Way To Disable It And Use The Previous One (On Click -> Open Details)?

It broke the shift holding to select multiple experiments btw

2 years ago

0 Hey, Clearml Team! When Can We Expect An Updated Roadmap? Last One Is From August

I am also interested in the clearml-serving part 😄

3 years ago

0 Hi, I Would Like To Follow-Up In This

meaning the RestAPI returns nothing, is that correct

Yes exactly, this is the response from the api server when I try to scroll down on the console to get more logs

2 years ago

0 Hi, I Would Like To Follow-Up In This

AgitatedDove14 SuccessfulKoala55 I just saw that clearml-server 1.4.0 was released, congrats 🚀 🙌 Was this bug fixed with this new version?

2 years ago

0 Hey, I Have A Problem With The Following Task:

Hi TimelyPenguin76 ,
trains-server: 0.16.1-320
trains: 0.15.1
trains-agent: 0.16

4 years ago

0 Hi, How Can I Get The Logs From The Pytorch Ignite Early Stopping Handler To Be Logged In Clearml?

v0.17.5rc2

3 years ago

0 Hi There,

This is what I get with mprof on this snippet above (I killed the program after the bar reaches 100%, otherwise it hangs trying to upload all the figures)

one year ago

0 Hi There,

Hi @<1523701205467926528:profile|AgitatedDove14> @<1537605940121964544:profile|EnthusiasticShrimp49> , the issue above seemed to be the memory leak and it looks like there is no problem from clearml side.
I trained successfully without mem leak with num_workers=0 and I am now testing with num_workers=8.
Sorry for the false positive :man-bowing:

one year ago

0 Hi There,

Disclaimer: I didn't check this will reproduce the bug, but that's all the components that should reproduce it: a for loop creating figures and clearml logging them

one year ago

0 Hey There, I Would Like To Increase The

mmmh it fails, but if I connect to the instance and execute ulimit -n , I do see
65535while the tasks I send to this agent fail with:
OSError: [Errno 24] Too many open files: '/root/.commons/images/aserfgh.png'and from the task itself, I run:
import subprocess print(subprocess.check_output("ulimit -n", shell=True))Which gives me in the logs of the task:
b'1024'So nnofiles is still 1024, the default value, but not when I ssh, damn. Maybe rebooting would work

3 years ago

0 Hi There,

Ok so what is the value that is set when it is run by the agent? agg ?

one year ago

0 Hi There,

Ok to be fair I get the same curve even when I remove clearml from the snippet, not sure why

one year ago

0 Hi There,

I think that somehow somewhere a reference to the figure is still living, so plt.close("all") and gc cannot free the figure and it ends up accumulating. I don't know where yet

one year ago

0 Hi There,

If I manually call report_matplotlib_figure yes. If I don't (just create the figure), no mem leak

one year ago

0 Hi There,

Ok no it only helps if as far as I don't log the figures. If I log the figures, I will still run into the same problem

one year ago

0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

in my clearml.conf, I only have:
sdk.aws.s3.region = eu-central-1 sdk.aws.s3.use_credentials_chain = true agent.package_manager.pip_version = "==20.2.3"

one year ago

0 Hi There,

Is it exactly agg or something different?

one year ago

0 Hi There,

Well no luck - using matplotlib.use('agg') in my training codebase doesn't solve the mem leak

one year ago

0 Hi There,

Yes that was my assumption as well, it could be several causes to be honest now that I see that also matplotlib itself is leaking 😄

one year ago

0 Hi, I Would Like To Bring Awareness

RuntimeError: CUDA error: no kernel image is available for execution on the device

one year ago

0 Hi There,

For me it is definitely reproducible 😄 But the codebase is quite large, I cannot share. The gist is the following:

import matplotlib.pyplot as plt
import numpy as np
from clearml import Task
from tqdm import tqdm

task = Task.init("Debug memory leak", "reproduce")

def plot_data():
    fig, ax = plt.subplots(1, 1)
    t = np.arange(0., 5., 0.2)
    ax.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
    return fig

for i in tqdm(range(1000), total=1000):
    fig = plot_data()
  ...

one year ago

0 Hi, I Updated To Clearml-Server 1.4.0 And I Am Uncomfortable With The New Table/Detail View, Is There A Way To Disable It And Use The Previous One (On Click -> Open Details)?

DeterminedCrab71 This is the behaviour of holding shift while selecting in Gmail, if ClearML could reproduce this, that would be perfect!

2 years ago

0 Hi, I Would Like To Bring Awareness

oh seems like it is not synced, thank you for noticing (it will be taken care immediately)

Thank you!

does not contain a specific wheel for cuda117 to x86, they use the pip defualt one

Yes so indeed they don't provide support for earlier cuda versions on latest torch versions. But I should still be able to install torch==1.11.0+cu115 even if I have cu117. Before that is what the clearml-agent was doing

one year ago

0 Hi There,

With a large enough number of iterations in the for loop, you should see the memory grow over time

one year ago

0 Hi There,

Adding back clearml logging with matplotlib.use('agg') , uses more ram but not that suspicious

one year ago

0 Hey, Clearml Team! When Can We Expect An Updated Roadmap? Last One Is From August

automatically promote models to be served from within clearml

Yes!

3 years ago

0 Hi, I Would Like To Bring Awareness

When running my training code

one year ago

0 Hi, If I Am Starting My Training With The Following Command:

Hi AgitatedDove14 , I investigated further and got rid of a separate bug. I was able to get ignite’s events fired, but still no scalars logged 😞
There is definitely something wrong going on with the reporting of scalars using multi processes, because if my ignite callback is the following:

` def log_loss(engine):
idist.barrier(). # Sync all processes
device = idist.device()
print("IDIST", device)
from clearml import Task
Task.current_task().get_logger().r...

3 years ago

Show more results