GrievingTurkey78

34 Questions, 125 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

119 × Eureka!

Questions 34
Answers 125

0 Votes

5 Answers

937 Views

0 Votes 5 Answers 937 Views

Hi, With The Upcoming Version Of Hydra It Seems The Binding Breaks. Specifically In The

Hi, with the upcoming version of Hydra it seems the binding breaks. Specifically in the run_job function the argument order changed from https://github.com/f...

clearml

3 years ago

0 Votes

6 Answers

961 Views

0 Votes 6 Answers 961 Views

Hi! I Have Some Agents On Gcp. Lately I Have Been Getting Some Experiments That Simply Stop Running (No Signs That The Experiment Crashed). Here Is A Plot That Shows The Resource Monitoring. Any Ideas On What Could Be Causing This?

Hi! I have some agents on GCP. Lately I have been getting some experiments that simply stop running (no signs that the experiment crashed). Here is a plot th...

clearml

3 years ago

0 Votes

2 Answers

941 Views

0 Votes 2 Answers 941 Views

Hi AgitatedDove14 ! Regarding the Hydra integration, which pattern should be used? Call the task inside the decorated function? Will this store the parameter...

clearml

3 years ago

0 Votes

2 Answers

887 Views

0 Votes 2 Answers 887 Views

Hi! Regarding The

Hi! Regarding the artifact.get_local_copy() method, since there is no way to specify the path where the artifact will be downloaded, I wanted to confirm that...

clearml

4 years ago

Show more results

0 Hello

What additional context do you need?

one year ago

0 Hi! Is There Something Happening With The

If you try:
ModelCheckpoint('best_model.hdf5', save_best_only=True)does it work too?

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Last question CostlyOstrich36 sorry to poke you! Seems even though if I set an extremely long time it will still fail when the first plots are reported. The first plots are generated automatically by pytorch lightning and track the cpu and gpu usage. Do you think this could be the cause? or should it also detect the iteration.

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

CostlyOstrich36 That seemed to do the job! No message after the first epoch, with the caveat of losing resource monitoring. Any idea of what could be causing this? If the resource monitor is the first plot then the iteration detection will fail? Are there any hacks to keep the resource monitoring? Thanks a lot! 🙌

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

CostlyOstrich36 Pytorch lightning exposes the current_epoch in the trainer, not sure if that is what you mean.

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Hey CostlyOstrich36 I am doing a lot of things before the first plot is reported! Is the seconds_from_start parameter unbounded? What should I do if it takes a lot of time to report the first plot?

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Thanks!

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Oh I think I am wrong! Then it must be the clearml monitoring. Still it fails way before the timer ends.

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Thanks 🙌

3 years ago

0 Hello

Managed to get:

clearml_agent: ERROR: Command '['/home/ramon/.clearml/venvs-builds/3.9/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/var/tmp/requirements_tb0x2i3j.txt', '--extra-index-url', '

 died with <Signals.SIGKILL: 9>.

while building the task with the id on the agent

one year ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Yes CostlyOstrich36

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

I'll give that a try! Thanks CostlyOstrich36

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Sure! Could you point me out how its done

3 years ago

0 I Am Also Experiencing A Weird Behaviour When Running A Script Using The Module Flag. For Example I Run:

Yes, everything is that way (work dir and args are ok) except the script path . It shows -m module arg1 arg2 .

4 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

I set it to 200000 ! But the problem stems from when the first plot is the clearml cpu and gpu monitoring, were you able to reproduce it? Even if I set the number fairly large when the monitoring plot was reported the message appeared.

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

I set the number to a crazy value and it fails around the same iteration

3 years ago

0 Hi! Regarding The

Thanks for the info AgitatedDove14 !

4 years ago

0 Hi

Let me work on it 👌

2 years ago

0 Hello

Sure! For torch I have:

torch==2.0.1
    # via
    #   monai
    #   pytorch-lightning
    #   torchio
    #   torchmetrics

one year ago

0 Hi! I Have Some Clearml Agents On Gcp And Sometimes The Instance Seems To Reboot Making The Experiment Fail And All The Progress Is Lost. What Is The Best Way To Resume An Experiment?

Thanks 🙌

2 years ago

0 Hi, I Was Getting A Really Weird Error Due To Mismatch On The Versions Between The Installed Libraries In My Environment And The Ones Ran In The Node (I Manually Changed The Installed Packages And Everything Worked). How Can I Force Trains To Use Exactly

AgitatedDove14 I am not sure why the packages get different versions, maybe since the package is not directly imported in my code it is possible to get a different version to what I have locally (?). Should all the libraries versions match exactly between local and the code that runs in the agent? The Task.add_requirements(package_name, package_version=None) workaround works perfectly! I just add the previous version that doesn’t break the code. Yes, definitely a force flag could help ...

4 years ago

0 Hi All! Is There A Way For Trains To Recognize The Cli Arguments When Using

AgitatedDove14 I filed an issue of fire for them to point us to the argument parsing method https://github.com/google/python-fire/issues/291

4 years ago

0 Hi! If I Have A Pipeline On Gitlab That Uses Clearml For Some Tests Is There Some Way To Setup The Credentials So That It Doesn’T Fail?

Thanks SuccessfulKoala55 !

3 years ago

0 Hi

SuccessfulKoala55 Is the update from 1.2.0 only updating the docker-compose file?

2 years ago

0 Hi! I Was Taking A Look At The

Nice catch AgitatedDove14 ! Sure I’ll open the issue right now.

3 years ago

0 Hi! If I Have A Folder With Multiple

My bad :man-facepalming: It was just specifying weights_path=dirpath since the first argument is weights_filename

2 years ago

0 I Am Trying To Upgrade From Clearml Server 0.16 To The Newest Version But I Am Getting Some Errors When Spinning Up The New Containers:

Yes AgitatedDove14 ! I’ll PM you

3 years ago

0 Hi

I am using the code inside the on_train_epoch_end inside a metric. So the important part is:
` fig = plt.figure()

my plot

logger.experiment.add_figure("fig", fig)
plt.close() `

2 years ago

0 Hi All! Is There A Way For Trains To Recognize The Cli Arguments When Using

Sure, I’ll share It through a private message!

4 years ago

0 Hello

Yes, I configured it that way 👌 Thanks! I'll use the flag!

one year ago

Show more results