Hi @<1618418423996354560:profile|JealousMole49> ! To disable previews, you need to set all of the values below to 0 in clearml.conf
:
dataset.preview.media.max_file_size
dataset.preview.tabular.table_count
dataset.preview.tabular.row_count
dataset.preview.media.image_count
dataset.preview.media.video_count
dataset.preview.media.audio_count
dataset.preview.media.html_count
dataset.preview.media.json_count
Also, I believe you could go through each dataset and remove the Dataset Content
configuration object as a start. Unfortunately, this one can't be disabled at the moment
Hi @<1618418423996354560:profile|JealousMole49> , it seems most of the reported metrics (which includes metadata) is taken up by the dataset metadata (including previews and reported files metadata)
@<1618418423996354560:profile|JealousMole49> , if you added datasets, than the data is not stored locally and probably uploaded to the fileserver?
H @<1523701087100473344:profile|SuccessfulKoala55> , the data is stored locally for sure - we have more data than even the 100GB artifacts storage would allow. Plus I doubt that the actual dataset would be counted against the metrics storage quota, right?
Your second remark is pretty much what the email support told me before sending me here. The problem is that I still don't know how I can
a) prevent my datasets taking up so much metrics space (can I disable previews?)
b) find and remove any unneeded data so I can continue using clearML
At the same time I'm also currently looking into self-hosting the open source version to get rid of these limits, but there seems to be no path whatsoever for migrating the existing data on SaaS to a self-hosted instance, which is very frustrating.
Any idea what I could do? I'm starting to wonder how anyone could use the dataset functionality productively if it just fills up the metric quota like this...
The config values are not yet documented, but they all default to 10
(except for max_file_size) and represent the number of images/tables/videos etc. that are reported as previews to the dataset. Setting them to 0 disables previewing
To clear the configurations, you should use something like Dataset.list_datasets
to get all the dataset IDs, then something like:
from clearml import Task
id_ = "229f14fe0cb942708c9c5feb412a7ffe"
task = Task.get_task(id_)
original_status = task.status
print(original_status)
if original_status in ["completed", "failed", "aborted"]:
task.mark_started(force=True)
task._set_configuration(name="Dataset Content", config_text="")
if original_status == "completed":
task.mark_completed(force=True)
elif original_status == "failed":
task.mark_failed(force=True)
elif original_status == "aborted":
task.mark_stopped(force=True)
to clear the configuration
Hi Eugen, thanks for the pointers! Is there any documentation about those config values I could read so I better understand what I'm doing?
As for your suggestion for reducing the current usage, is that something I would do in the web dashboard (app.clear.ml) or through the API? I'm not sure what this Dataset Content
configuration object is exactly and where I'd have to remove it.
@<1523701070390366208:profile|CostlyOstrich36> Thanks for your reply, unfortunately this is exactly the problem: I simply cannot explain that the storage filled up all of a sudden, so I have no idea what to delete. At least 2/3 of the available space was filled during a time where we didn't run any experiments whatsoever, we only added a bunch of datasets. AFAIK there are no logs/metrics/plots involved, and the data itself is always stored locally.
I tried to contact support but they where not willing to help me since I don't pay, they did mention however that the dataset previews are sharing that storage as well. So now I'm wondering how I can remove those previews? Obviously I don't want to remove the whole Task associated with the dataset, since I still want to be able to use the datasets.
@<1618418423996354560:profile|JealousMole49> you can export tasks and import them (see None ), however this will not include the metrics
@<1523701070390366208:profile|CostlyOstrich36> @<1523701087100473344:profile|SuccessfulKoala55> I'm still urgently looking for a solution for this as we depend on clearML to mange our datasets. Any suggestions how I could find out where my storage space went and free it up?
Thank you so much @<1523701435869433856:profile|SmugDolphin23> , that should help a lot!
Just to be sure I understand correctly: Removing this Content configuration only removes the previews and associated data, but leaves the dataset itself fully accessible, correct?
And since you seem quite knowledgable on the subject: Do you know if there is a way to transfer these tasks from one ClearML server instance to another (specifically from SaaS to a self-hosted instance)?
Hi @<1618418423996354560:profile|JealousMole49> , I'm afraid there is no such capability at the moment. Basically metrics mean any metadata that was saved (scalars, logs, plots etc). You can delete some log/metric heavy experiments/tasks/datasets to free up some space. Makes sense?
@<1618418423996354560:profile|JealousMole49> how did you hit the limit? Do you see any error message? Are you using the self hosted version or the SaaS version in app.clear.ml ?
@<1523701087100473344:profile|SuccessfulKoala55> I get a warning in the ClearML Dashboard (app.clear.ml) that my metrics storage is almost full. I have no idea what happens when I reach the limit, but since we are dependent on being able to use the datasets stored in ClearML, I don't really want to find out... (Using the SaaS version)