Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
GrievingTurkey78
Moderator
34 Questions, 125 Answers
  Active since 10 January 2023
  Last activity 9 months ago

Reputation

0

Badges 1

119 × Eureka!
0 Hello

It is failing exactly when the download finishes. Not sure if it is something but on the ~/.clearml/pip-download-cache only a cu120 empty folder appears. Should the torch wheel be saved there?

9 months ago
0 Hello

What additional context do you need?

9 months ago
0 Hi! I Am Getting The Following Error On An Agent:

Give me a couple of minutes ๐Ÿ™Œ

2 years ago
0 Hello

Sure! For torch I have:

torch==2.0.1
    # via
    #   monai
    #   pytorch-lightning
    #   torchio
    #   torchmetrics
9 months ago
0 Hi! I Am Having Some Problems With A Loss After A Good Amount Of Training, What Would Be The Best Way To Log A Value To Have A Better Idea Of What Is Happening?

AgitatedDove14 Well I have a loss function which is something like:
class MyLoss(...): def forward(...): weights = self.compute_weights(...) return (weights * (target-preds)).mean()There seems to be a problem on certain batch when computing the weights. What would be the best way to log the batch that causes the problem, along with the weights being computed.

2 years ago
0 Hi

Thanks!

3 years ago
0 I Am Also Experiencing A Weird Behaviour When Running A Script Using The Module Flag. For Example I Run:

So should I set them all with a default value? The working dir is the project one, the one that contains the module package

3 years ago
0 Hello

Yes, I configured it that way ๐Ÿ‘Œ Thanks! I'll use the flag!

9 months ago
0 Hey Everyone- I Have An Issue Started Today With Trains-Agent Which I’M Getting This Error On Startup:

` File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/trains/backend_api/session/token_manager.py", line 72, in _get_token_exp
return jwt.decode(token, verify=False).get('exp', sys.maxsize)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 113, in decode
decoded = self.decode_complete(jwt, key, algorithms, options, **kwargs)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 80, in decode_c...

3 years ago
0 Hi

SuccessfulKoala55 Is the update from 1.2.0 only updating the docker-compose file?

2 years ago
0 Hi! Is There Something Happening With The

Thanks AgitatedDove14 ๐Ÿ™Œ

3 years ago
0 Hi! I Was Taking A Look At The

Nice catch AgitatedDove14 ! Sure Iโ€™ll open the issue right now.

2 years ago
0 Hi! I Was Taking A Look At The

Yes AgitatedDove14 , I am not sure what they use by default. Here is a simple working example:
` from typing import Optional

import torch
from clearml import Task
from pytorch_lightning import LightningDataModule, LightningModule
from pytorch_lightning.utilities.cli import LightningCLI
from torch.utils.data import DataLoader, Dataset, Subset

class RandomDataset(Dataset):
def init(self, size, length):
self.len = length
self.data = torch.randn(length, size)

def ...
2 years ago
0 Hi! Is There Something Happening With The

Thanks AgitatedDove14 ! seems to be subclassed model + extension

3 years ago
0 Hi! Is There Something Happening With The

Basically one points to an hdf5 and the other one has no extensiion

3 years ago
0 Hi! Is There Something Happening With The

AgitatedDove14 its on the checkpoint

3 years ago
0 Hi! Is There Something Happening With The

This works:
filepath = self.log_dir + os.sep + "checkpoint" self.callbacks.append( ModelCheckpoint( filepath, monitor="val_loss", mode="min", save_best_only=True, save_weights_only=True, ) )And this doesnโ€™t:
` filepath = self.log_dir + os.sep + "checkpoint.hdf5"
self.callbacks.append(
ModelCheckpoint(
filepath,
...

3 years ago
0 Hi! Is There Something Happening With The

Hey AgitatedDove14 after playing around seems that if the callback filepath points to an hdf5 file it is not uploaded.

3 years ago
0 Hi! Is There Something Happening With The

I changed it to point to a folder and it shows up

3 years ago
0 Hi! I Have Some Clearml Agents On Gcp And Sometimes The Instance Seems To Reboot Making The Experiment Fail And All The Progress Is Lost. What Is The Best Way To Resume An Experiment?

Hey CostlyOstrich36 sorry to ping you! Let's say I enqueue multiple experiments on a couple of agents and one of them fails. Is it possible to restart the experiment from the UI using the latest checkpoint? What if the experiment gets assigned to the other agent? I am not sure how the continue_last_task flag would help in this case.

2 years ago
0 Hi! Is There Something Happening With The

AgitatedDove14 Thanks! Im trying to figure out how to create a minimum working example! I am also working with Hydra so that may be a thing. The extension is whats causing it to fail (havenโ€™t figured out why).

3 years ago
0 Hi! Is There Something Happening With The

Thanks Martin! Iโ€™ll keep checking ๐Ÿ‘Œ

3 years ago
0 Hi! I Am Saving Some Intermediate

Hi CostlyOstrich36 ! The message is the following:
clearml.model - INFO - Selected model id: 27c1a1700b0b4e25a4344dc4ef9868faThey are not models, those are intermediate tensors I am caching to make training faster. I don't need to log them.

2 years ago
0 Hi! Is There Something Happening With The

It works perfectly! AgitatedDove14 There is something weird on my side ๐Ÿ˜ข

3 years ago
Show more results compactanswers