Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everyone! I'M Currently Using The Free Hosted Version (Open Source) Of Clearml. I'M Mainly Using Clearml-Data At To Manage Our Datasets At The Moment, And I'Ve Already Hit The Limit For The Free Metrics Storage. Since We Didn'T Store A Lot Of Metrics (

Hi everyone!
I'm currently using the free hosted version (open source) of clearML.
I'm mainly using clearml-data at to manage our datasets at the moment, and I've already hit the limit for the free metrics storage.
Since we didn't store a lot of metrics (from experiments), I was wondering:

  • Is there a way to see what is taking up space in the metrics storage?
  • How can I remove unneeded files/metrics from there?
    I didn't see any obvious way in the UI to do this, I'd be happy to find a command line / API solution.
  
  
Posted 2 months ago
Votes Newest

Answers 13


@<1523701070390366208:profile|CostlyOstrich36> Thanks for your reply, unfortunately this is exactly the problem: I simply cannot explain that the storage filled up all of a sudden, so I have no idea what to delete. At least 2/3 of the available space was filled during a time where we didn't run any experiments whatsoever, we only added a bunch of datasets. AFAIK there are no logs/metrics/plots involved, and the data itself is always stored locally.

I tried to contact support but they where not willing to help me since I don't pay, they did mention however that the dataset previews are sharing that storage as well. So now I'm wondering how I can remove those previews? Obviously I don't want to remove the whole Task associated with the dataset, since I still want to be able to use the datasets.

  
  
Posted 2 months ago

@<1523701087100473344:profile|SuccessfulKoala55> I get a warning in the ClearML Dashboard (app.clear.ml) that my metrics storage is almost full. I have no idea what happens when I reach the limit, but since we are dependent on being able to use the datasets stored in ClearML, I don't really want to find out... (Using the SaaS version)

  
  
Posted 2 months ago

Hi @<1618418423996354560:profile|JealousMole49> , I'm afraid there is no such capability at the moment. Basically metrics mean any metadata that was saved (scalars, logs, plots etc). You can delete some log/metric heavy experiments/tasks/datasets to free up some space. Makes sense?

  
  
Posted 2 months ago

@<1618418423996354560:profile|JealousMole49> how did you hit the limit? Do you see any error message? Are you using the self hosted version or the SaaS version in app.clear.ml ?

  
  
Posted 2 months ago

@<1618418423996354560:profile|JealousMole49> , if you added datasets, than the data is not stored locally and probably uploaded to the fileserver?

  
  
Posted 2 months ago

@<1523701070390366208:profile|CostlyOstrich36> @<1523701087100473344:profile|SuccessfulKoala55> I'm still urgently looking for a solution for this as we depend on clearML to mange our datasets. Any suggestions how I could find out where my storage space went and free it up?

  
  
Posted 2 months ago

H @<1523701087100473344:profile|SuccessfulKoala55> , the data is stored locally for sure - we have more data than even the 100GB artifacts storage would allow. Plus I doubt that the actual dataset would be counted against the metrics storage quota, right?

Your second remark is pretty much what the email support told me before sending me here. The problem is that I still don't know how I can
a) prevent my datasets taking up so much metrics space (can I disable previews?)
b) find and remove any unneeded data so I can continue using clearML

At the same time I'm also currently looking into self-hosting the open source version to get rid of these limits, but there seems to be no path whatsoever for migrating the existing data on SaaS to a self-hosted instance, which is very frustrating.

Any idea what I could do? I'm starting to wonder how anyone could use the dataset functionality productively if it just fills up the metric quota like this...

  
  
Posted 2 months ago

Hi @<1618418423996354560:profile|JealousMole49> ! To disable previews, you need to set all of the values below to 0 in clearml.conf :

dataset.preview.media.max_file_size
dataset.preview.tabular.table_count
dataset.preview.tabular.row_count
dataset.preview.media.image_count
dataset.preview.media.video_count
dataset.preview.media.audio_count
dataset.preview.media.html_count
dataset.preview.media.json_count

Also, I believe you could go through each dataset and remove the Dataset Content configuration object as a start. Unfortunately, this one can't be disabled at the moment

  
  
Posted 2 months ago

Hi @<1618418423996354560:profile|JealousMole49> , it seems most of the reported metrics (which includes metadata) is taken up by the dataset metadata (including previews and reported files metadata)

  
  
Posted 2 months ago

@<1618418423996354560:profile|JealousMole49> you can export tasks and import them (see None ), however this will not include the metrics

  
  
Posted 2 months ago

The config values are not yet documented, but they all default to 10 (except for max_file_size) and represent the number of images/tables/videos etc. that are reported as previews to the dataset. Setting them to 0 disables previewing

To clear the configurations, you should use something like Dataset.list_datasets to get all the dataset IDs, then something like:

from clearml import Task


id_ = "229f14fe0cb942708c9c5feb412a7ffe"
task = Task.get_task(id_)
original_status = task.status
print(original_status)
if original_status in ["completed", "failed", "aborted"]:
    task.mark_started(force=True)
task._set_configuration(name="Dataset Content", config_text="")
if original_status == "completed":
    task.mark_completed(force=True)
elif original_status == "failed":
    task.mark_failed(force=True)
elif original_status == "aborted":
    task.mark_stopped(force=True)

to clear the configuration

  
  
Posted 2 months ago

Thank you so much @<1523701435869433856:profile|SmugDolphin23> , that should help a lot!
Just to be sure I understand correctly: Removing this Content configuration only removes the previews and associated data, but leaves the dataset itself fully accessible, correct?
And since you seem quite knowledgable on the subject: Do you know if there is a way to transfer these tasks from one ClearML server instance to another (specifically from SaaS to a self-hosted instance)?

  
  
Posted 2 months ago

Hi Eugen, thanks for the pointers! Is there any documentation about those config values I could read so I better understand what I'm doing?

As for your suggestion for reducing the current usage, is that something I would do in the web dashboard (app.clear.ml) or through the API? I'm not sure what this Dataset Content configuration object is exactly and where I'd have to remove it.

  
  
Posted 2 months ago
175 Views
13 Answers
2 months ago
2 months ago
Tags