GrievingTurkey78

34 Questions, 125 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

119 × Eureka!

Answers 125

0 Hi! I Am Saving Some Intermediate

Thanks! This should work perfectly 👌

3 years ago

0 Hi

Let me work on it 👌

2 years ago

0 Hey Everyone- I Have An Issue Started Today With Trains-Agent Which I’M Getting This Error On Startup:

I am still getting the error even with the v0.16.3 agent, is there something else we have to do other than updating it?

3 years ago

0 Hi

I enabled both https and http

4 years ago

0 Hi

SuccessfulKoala55 just to let you know: since I opened the link straight from the GCP console it was using https on the address instead of http hence the error. Thanks a lot for your help!

4 years ago

0 Hi, I Was Getting A Really Weird Error Due To Mismatch On The Versions Between The Installed Libraries In My Environment And The Ones Ran In The Node (I Manually Changed The Installed Packages And Everything Worked). How Can I Force Trains To Use Exactly

Using detect_with_pip_freeze: true runs into package version not found for some of the ones I have locally.

4 years ago

0 Hi

Thanks!

3 years ago

0 Hi! I Was Taking A Look At The

There are also ways to override the parameters as stated https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_cli.html#use-of-command-line-arguments .

3 years ago

0 Hi! I Have Some Agents On Gcp. Lately I Have Been Getting Some Experiments That Simply Stop Running (No Signs That The Experiment Crashed). Here Is A Plot That Shows The Resource Monitoring. Any Ideas On What Could Be Causing This?

Hey CostlyOstrich36 ! I am using clearml==1.1.2 and clearml-agent==1.1.0 . Stopped is not the right word, more like frozen, it just froze at an epoch. The console on the agent shows epoch 33 first batch and the one at the server epoch 32 last batch. The experiment was running for ~6 hours.

3 years ago

0 Hi! Regarding The

Thanks for the info AgitatedDove14 !

4 years ago

0 Hi All! Currently I Am Trying To Create A Tool That Can Perform Certain Operations On Dataset Ids, This Is A Skeleton Of What I Have In Mind (Based On The Examples):

Thanks AgitatedDove14

3 years ago

0 Hello! There Is Great Alternative For Argparse Developed By Facebook For Ml Named

Best thing ever, thanks AgitatedDove14 !

4 years ago

0 Hello! There Is Great Alternative For Argparse Developed By Facebook For Ml Named

AgitatedDove14 from this thread I understand hydra is not supported and therefore overriding the parameters from the UI wont work, but is there still a way to track and add the parameters to the experiment? Will task.connect_configuration work with the yaml files?

4 years ago

0 Hello

Yes, I configured it that way 👌 Thanks! I'll use the flag!

one year ago

0 Hello

Managed to get:

clearml_agent: ERROR: Command '['/home/ramon/.clearml/venvs-builds/3.9/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/var/tmp/requirements_tb0x2i3j.txt', '--extra-index-url', '

 died with <Signals.SIGKILL: 9>.

while building the task with the id on the agent

one year ago

0 Hello

It is failing exactly when the download finishes. Not sure if it is something but on the ~/.clearml/pip-download-cache only a cu120 empty folder appears. Should the torch wheel be saved there?

one year ago

0 Hello

Sure! For torch I have:

torch==2.0.1
    # via
    #   monai
    #   pytorch-lightning
    #   torchio
    #   torchmetrics

one year ago

0 Hello

@<1523701070390366208:profile|CostlyOstrich36> Thanks for the help! It ended being a mistake on my side. Misconfigured the VM's memory and it had only 3.75 G. Failed when installing torch.

one year ago

0 Hello

What additional context do you need?

one year ago

0 Hi

SuccessfulKoala55 Is the update from 1.2.0 only updating the docker-compose file?

2 years ago

0 Hi! I Am Using The Modelcheckpoint Callback From Tensorflow To Save The Best Model. When The Experiment Finishes If I Go On The Server To Experiment > Artifacts > Output Model I Can See The Model And Subsequently By Clicking On It The Weights. How Can I

I just want to retrieve the weights on a script that tests models I have trained in the past

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

CostlyOstrich36 That seemed to do the job! No message after the first epoch, with the caveat of losing resource monitoring. Any idea of what could be causing this? If the resource monitor is the first plot then the iteration detection will fail? Are there any hacks to keep the resource monitoring? Thanks a lot! 🙌

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Hey CostlyOstrich36 I am doing a lot of things before the first plot is reported! Is the seconds_from_start parameter unbounded? What should I do if it takes a lot of time to report the first plot?

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Thanks!

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

CostlyOstrich36 Pytorch lightning exposes the current_epoch in the trainer, not sure if that is what you mean.

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Sure! Could you point me out how its done

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Last question CostlyOstrich36 sorry to poke you! Seems even though if I set an extremely long time it will still fail when the first plots are reported. The first plots are generated automatically by pytorch lightning and track the cpu and gpu usage. Do you think this could be the cause? or should it also detect the iteration.

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

I set the number to a crazy value and it fails around the same iteration

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Yes CostlyOstrich36

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Oh I think I am wrong! Then it must be the clearml monitoring. Still it fails way before the timer ends.

3 years ago

Show more results