GrievingTurkey78

34 Questions, 125 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

119 × Eureka!

Questions 34
Answers 125

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi! I Recently Updated My Server And My Clearml Version, Now When I Set A Task To Be Executed Remotely Its Default State Is Aborted Hence I Have To Reset And Enqueue, Is There Something I Am Doing Wrong (I Am Using Hydra Too)?

Hi! I recently updated my server and my clearml version, now when I set a task to be executed remotely its default state is aborted hence I have to reset and...

clearml

4 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hi! Is There A Way To Run A Task Without Reporting To The Server? For Example If I Want To Debug A Script By Running It Locally Without It Appearing On The Server

Hi! Is there a way to run a task without reporting to the server? For example if I want to debug a script by running it locally without it appearing on the s...

clearml

4 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hi All! Is There A Way For Trains To Recognize The Cli Arguments When Using

Hi all! Is there a way for trains to recognize the CLI arguments when using https://github.com/google/python-fire instead of argparse?

clearml

5 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi! I Have Some Agents On Gcp. Lately I Have Been Getting Some Experiments That Simply Stop Running (No Signs That The Experiment Crashed). Here Is A Plot That Shows The Resource Monitoring. Any Ideas On What Could Be Causing This?

Hi! I have some agents on GCP. Lately I have been getting some experiments that simply stop running (no signs that the experiment crashed). Here is a plot th...

clearml

4 years ago

Show more results

0 I Am Also Experiencing A Weird Behaviour When Running A Script Using The Module Flag. For Example I Run:

So should I set them all with a default value? The working dir is the project one, the one that contains the module package

5 years ago

0 Hi! I Am Having Some Problems With A Loss After A Good Amount Of Training, What Would Be The Best Way To Log A Value To Have A Better Idea Of What Is Happening?

Awesome AgitatedDove14 Thanks a lot 🙌

3 years ago

0 Hi! I Am Currently Using Hydra+Clearml And Wanted To Know If There Are Still Some Updates Coming. At The Moment, If I Change The Defaults Hydra Uses From The

Side note: When running src.train as a module the server gets the command as src and has to be modified to be src.train

4 years ago

0 Hi, Is There A Way To Force The Requirements.Txt? I Have A Package I Installed Directly From Github But The Version Is Always Wrong. Any Other Way To Do This?

Thanks AgitatedDove14 !

4 years ago

0 Hi! I Am Saving Some Intermediate

So I would have to disconnect pytorch? And then upload the model at the end

4 years ago

0 Hi! I Am Getting The Following Error On An Agent:

It is the latest RC, I get the following:
` Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults 'pip<20.2' --quiet --json
Pass
Trying pip install: /home/ramon/.clearml/venvs-builds/3.8/task_repository/my-rep.git/requirements.txt
Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults numpy==1.20.3 --quiet --json
Pass
Warning, could not locate PyTorch to...

3 years ago

0 Hi! I Am Getting The Following Error On An Agent:

Not yet AgitatedDove14 , does the agent use by default the python version the command is run with? I installed conda and tried using package_manager.type=conda but then get an error:
clearml_agent: ERROR: 'NoneType' object has no attribute 'lower'

3 years ago

0 Hi! I Am Using The Modelcheckpoint Callback From Tensorflow To Save The Best Model. When The Experiment Finishes If I Go On The Server To Experiment > Artifacts > Output Model I Can See The Model And Subsequently By Clicking On It The Weights. How Can I

👌 Great

4 years ago

0 Hello

@<1523701070390366208:profile|CostlyOstrich36> Thanks for the help! It ended being a mistake on my side. Misconfigured the VM's memory and it had only 3.75 G. Failed when installing torch.

2 years ago

0 Hi! I Am Saving Some Intermediate

Hi CostlyOstrich36 ! The message is the following:
clearml.model - INFO - Selected model id: 27c1a1700b0b4e25a4344dc4ef9868faThey are not models, those are intermediate tensors I am caching to make training faster. I don't need to log them.

4 years ago

0 I Am Also Experiencing A Weird Behaviour When Running A Script Using The Module Flag. For Example I Run:

I’ll show you what I have through PM!

5 years ago

I just want to retrieve the weights on a script that tests models I have trained in the past

4 years ago

On the server through the command line?

4 years ago

0 Hi

Let me work on it 👌

3 years ago

0 Hi! Any Idea Why Clearml Fails To Detect Iteration Reporting?

Oh I think I am wrong! Then it must be the clearml monitoring. Still it fails way before the timer ends.

4 years ago

It’s file://

4 years ago

0 Hi, I Was Getting A Really Weird Error Due To Mismatch On The Versions Between The Installed Libraries In My Environment And The Ones Ran In The Node (I Manually Changed The Installed Packages And Everything Worked). How Can I Force Trains To Use Exactly

No, I have all the packages with a version. I just want to know if there is a way to override the requirements versions detected by Pigar when using detect_with_pip_freeze: false . I have locally cloudpickle==1.4.1 but when running the code and sending the task to the node the environment uses cloudpickle==1.6.0 . I have to manually change the version on the UI. Is there a way to force this single package to have a version? Maybe on the requirments.txt or something similar

5 years ago

0 I Am Also Experiencing A Weird Behaviour When Running A Script Using The Module Flag. For Example I Run:

Yes, everything is that way (work dir and args are ok) except the script path . It shows -m module arg1 arg2 .

5 years ago

Using the get_weights(True) I get ValueError: Could not retrieve a local copy of model weights <ID>, failed downloading <URL>

4 years ago

0 Hi

I am using the code inside the on_train_epoch_end inside a metric. So the important part is:
` fig = plt.figure()

my plot

logger.experiment.add_figure("fig", fig)
plt.close() `

3 years ago

0 Hi! I Have Some Agents On Gcp. Lately I Have Been Getting Some Experiments That Simply Stop Running (No Signs That The Experiment Crashed). Here Is A Plot That Shows The Resource Monitoring. Any Ideas On What Could Be Causing This?

I am using pytorch_lightning , I'll try to create a snippet I can share! Thanks 🙌

4 years ago

0 Hi! Does Clearml Have A Way To Turn On/Off Virtual Machines Depending If There Are Experiments On Queue?

Yes! I will take a look at it!

4 years ago

TimelyPenguin76 I found out its just one package that is causing the error ( cloudpickle breaks everything). Is there a way to use Pigar but force a single package to have a version?

5 years ago

0 Hi! If I Have A Pipeline On Gitlab That Uses Clearml For Some Tests Is There Some Way To Setup The Credentials So That It Doesn’T Fail?

Thanks SuccessfulKoala55 !

4 years ago

0 I Am Trying To Upgrade From Clearml Server 0.16 To The Newest Version But I Am Getting Some Errors When Spinning Up The New Containers:

Yes AgitatedDove14 ! I’ll PM you

4 years ago

0 Hi! I Changed From Trains To Clearml And Ran Some Experiments Using Keras But It Seems The Metrics Are Not Being Tracked Automagically, Has Anyone Ran Into The Same Issue? I Can Even See The Metrics On The Progress Bar During The Fit Process.

Thanks TimelyPenguin76 , the example works fine! I’ll debug further on my side!

4 years ago

I get the URL to the checkpoint/weights can I use this to download the weights?

4 years ago