
Reputation
Badges 1
12 × Eureka!A bar graph takes a set of numbers and assigns a bar to each number where the height of the bar represents the number. This is what you see in the example in the link I attached.
A histogram takes a set of number and divides them into bins, and plots the number of samples in each bin.
John, regarding documentation - what is mainly missing is how he dataframe / numpy / dict is structures. Yes, an example would be helpful
This is what I get when running on Clearml. Notice the nan in the loss
Epoch 1/150
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1739804333.538008 890492 service.cc:145] XLA service 0x7f19b80029d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1739804333.538068 890492 service.cc:153] StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2025-02-17 14:58:54.0217...
The project is many 1000s of lines long. It fails in the model.fit TF command. The only thing different from other versions which work is the loss function - which I share below. The relevant class is BoundaryWithCategoricalDiceLoss
which is called with boundary_loss_type = "GRAD"
. When I use the loss with boundary_loss_type = "MSE"
all works fine. This class is a subclass of CategoricalDiceLoss
which is a sub-class of keras.losses.Loss
`from typing import Dict, Itera...
This is what I get when running w/o clearmlEpoch 1/150
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1739806371.262488 897794 service.cc:145] XLA service 0x7fc058066d20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1739806371.262578 897794 service.cc:153] StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
`2025-02-17 15:32:51.772357: I tensor...
This is what I get using clearmlThis is what I get when running on Clearml. Notice the nan in the loss
Epoch 1/150
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1739804333.538008 890492 service.cc:145] XLA service 0x7f19b80029d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
`I0000 00:00:1739804333.538068 890492 service.cc:153] StreamExecutor device (0): NVIDIA GeForce RTX 2080 ...
This is what I get when running the exact same training session without clearml
Epoch 1/150
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1739806371.262488 897794 service.cc:145] XLA service 0x7fc058066d20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1739806371.262578 897794 service.cc:153] StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2025-02-17 15:3...
The code that generates this is the fit method in TFmodel.fit(train_dataset, validation_data=val_dataset, epochs=cfg.fit.epochs, callbacks=callbacks, verbose=2)
Clearml is activated in the usual way:task = Task.init(project_name=project_name, task_name=name, output_uri=True, auto_connect_frameworks={'tensorflow': False}, **kwargs)
The only difference between the two runs is that in one run project_name
is and empty string (in which case all is OK), and in the other case project_name
has a value
I disagree that the difference is minute 🙂 They are fundamentally different plots. The term used by clearml is misleading.
I will use the manual plotting feature with matplotlib
Thanks John. But does the metadata relate to the entire dataset or individual elements in the dataset?
For example, lets say I have a dataset of images, and I would like to attach metadata to each image - e.g. a "type" field which could have values 1,2,3,4,5...
How would the dataframe be constructed? I assume one column would contain an ID identifying the image, and the other column would be "type". If this is the case, what would the ID be?
OK so if I understand correctly, you can only add metadata to items in the enterprise version.
Just to confirm, the screen shot in the dataops pages here refers to the enterprise version?
None