
Reputation
Badges 1
12 × Eureka!OK so if I understand correctly, you can only add metadata to items in the enterprise version.
Just to confirm, the screen shot in the dataops pages here refers to the enterprise version?
None
This is what I get when running the exact same training session without clearml
Epoch 1/150
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1739806371.262488 897794 service.cc:145] XLA service 0x7fc058066d20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1739806371.262578 897794 service.cc:153] StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2025-02-17 15:3...
This is what I get when running on Clearml. Notice the nan in the loss
Epoch 1/150
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1739804333.538008 890492 service.cc:145] XLA service 0x7f19b80029d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1739804333.538068 890492 service.cc:153] StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2025-02-17 14:58:54.0217...
John, regarding documentation - what is mainly missing is how he dataframe / numpy / dict is structures. Yes, an example would be helpful
A bar graph takes a set of numbers and assigns a bar to each number where the height of the bar represents the number. This is what you see in the example in the link I attached.
A histogram takes a set of number and divides them into bins, and plots the number of samples in each bin.
I disagree that the difference is minute 🙂 They are fundamentally different plots. The term used by clearml is misleading.
I will use the manual plotting feature with matplotlib
Thanks John. But does the metadata relate to the entire dataset or individual elements in the dataset?
For example, lets say I have a dataset of images, and I would like to attach metadata to each image - e.g. a "type" field which could have values 1,2,3,4,5...
How would the dataframe be constructed? I assume one column would contain an ID identifying the image, and the other column would be "type". If this is the case, what would the ID be?
The code that generates this is the fit method in TFmodel.fit(train_dataset, validation_data=val_dataset, epochs=cfg.fit.epochs, callbacks=callbacks, verbose=2)
Clearml is activated in the usual way:task = Task.init(project_name=project_name, task_name=name, output_uri=True, auto_connect_frameworks={'tensorflow': False}, **kwargs)
The project is many 1000s of lines long. It fails in the model.fit TF command. The only thing different from other versions which work is the loss function - which I share below. The relevant class is BoundaryWithCategoricalDiceLoss
which is called with boundary_loss_type = "GRAD"
. When I use the loss with boundary_loss_type = "MSE"
all works fine. This class is a subclass of CategoricalDiceLoss
which is a sub-class of keras.losses.Loss
`from typing import Dict, Itera...
This is what I get when running w/o clearmlEpoch 1/150
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1739806371.262488 897794 service.cc:145] XLA service 0x7fc058066d20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1739806371.262578 897794 service.cc:153] StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
`2025-02-17 15:32:51.772357: I tensor...
This is what I get using clearmlThis is what I get when running on Clearml. Notice the nan in the loss
Epoch 1/150
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1739804333.538008 890492 service.cc:145] XLA service 0x7f19b80029d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
`I0000 00:00:1739804333.538068 890492 service.cc:153] StreamExecutor device (0): NVIDIA GeForce RTX 2080 ...
The only difference between the two runs is that in one run project_name
is and empty string (in which case all is OK), and in the other case project_name
has a value