Hi Everyone! I'M Currently Using The Free Hosted Version (Open Source) Of Clearml. I'M Mainly Using Clearml-Data At To Manage Our Datasets At The Moment, And I'Ve Already Hit The Limit For The Free Metrics Storage. Since We Didn'T Store A Lot Of Metrics (

Answered

Hi everyone!
I'm currently using the free hosted version (open source) of clearML.
I'm mainly using clearml-data at to manage our datasets at the moment, and I've already hit the limit for the free metrics storage.
Since we didn't store a lot of metrics (from experiments), I was wondering:

Is there a way to see what is taking up space in the metrics storage?
How can I remove unneeded files/metrics from there?
I didn't see any obvious way in the UI to do this, I'd be happy to find a command line / API solution.

  				
Posted 
	one year ago

					More  		
  Report
		
					JealousMole49
				
					0
					 × 1

Votes Newest

Answers 13

Hi JealousMole49 , I'm afraid there is no such capability at the moment. Basically metrics mean any metadata that was saved (scalars, logs, plots etc). You can delete some log/metric heavy experiments/tasks/datasets to free up some space. Makes sense?

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

CostlyOstrich36 SuccessfulKoala55 I'm still urgently looking for a solution for this as we depend on clearML to mange our datasets. Any suggestions how I could find out where my storage space went and free it up?

  				
Posted 
	one year ago

					More  		
  Report
		
					JealousMole49
				
					0
					 × 1

Hi Eugen, thanks for the pointers! Is there any documentation about those config values I could read so I better understand what I'm doing?

As for your suggestion for reducing the current usage, is that something I would do in the web dashboard (app.clear.ml) or through the API? I'm not sure what this Dataset Content configuration object is exactly and where I'd have to remove it.

  				
Posted 
	one year ago

					More  		
  Report
		
					JealousMole49
				
					0
					 × 1

H SuccessfulKoala55 , the data is stored locally for sure - we have more data than even the 100GB artifacts storage would allow. Plus I doubt that the actual dataset would be counted against the metrics storage quota, right?

Your second remark is pretty much what the email support told me before sending me here. The problem is that I still don't know how I can
a) prevent my datasets taking up so much metrics space (can I disable previews?)
b) find and remove any unneeded data so I can continue using clearML

At the same time I'm also currently looking into self-hosting the open source version to get rid of these limits, but there seems to be no path whatsoever for migrating the existing data on SaaS to a self-hosted instance, which is very frustrating.

Any idea what I could do? I'm starting to wonder how anyone could use the dataset functionality productively if it just fills up the metric quota like this...

  				
Posted 
	one year ago

					More  		
  Report
		
					JealousMole49
				
					0
					 × 1

Hi JealousMole49 , it seems most of the reported metrics (which includes metadata) is taken up by the dataset metadata (including previews and reported files metadata)

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

SuccessfulKoala55 I get a warning in the ClearML Dashboard (app.clear.ml) that my metrics storage is almost full. I have no idea what happens when I reach the limit, but since we are dependent on being able to use the datasets stored in ClearML, I don't really want to find out... (Using the SaaS version)

  				
Posted 
	one year ago

					More  		
  Report
		
					JealousMole49
				
					0
					 × 1

CostlyOstrich36 Thanks for your reply, unfortunately this is exactly the problem: I simply cannot explain that the storage filled up all of a sudden, so I have no idea what to delete. At least 2/3 of the available space was filled during a time where we didn't run any experiments whatsoever, we only added a bunch of datasets. AFAIK there are no logs/metrics/plots involved, and the data itself is always stored locally.

I tried to contact support but they where not willing to help me since I don't pay, they did mention however that the dataset previews are sharing that storage as well. So now I'm wondering how I can remove those previews? Obviously I don't want to remove the whole Task associated with the dataset, since I still want to be able to use the datasets.

  				
Posted 
	one year ago

					More  		
  Report
		
					JealousMole49
				
					0
					 × 1

Hi JealousMole49 ! To disable previews, you need to set all of the values below to 0 in clearml.conf :

dataset.preview.media.max_file_size
dataset.preview.tabular.table_count
dataset.preview.tabular.row_count
dataset.preview.media.image_count
dataset.preview.media.video_count
dataset.preview.media.audio_count
dataset.preview.media.html_count
dataset.preview.media.json_count

Also, I believe you could go through each dataset and remove the Dataset Content configuration object as a start. Unfortunately, this one can't be disabled at the moment

  				
Posted 
	one year ago

					More  		
  Report
		
					SmugDolphin23
				
					0

JealousMole49 , if you added datasets, than the data is not stored locally and probably uploaded to the fileserver?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

The config values are not yet documented, but they all default to 10 (except for max_file_size) and represent the number of images/tables/videos etc. that are reported as previews to the dataset. Setting them to 0 disables previewing

To clear the configurations, you should use something like Dataset.list_datasets to get all the dataset IDs, then something like:

from clearml import Task


id_ = "229f14fe0cb942708c9c5feb412a7ffe"
task = Task.get_task(id_)
original_status = task.status
print(original_status)
if original_status in ["completed", "failed", "aborted"]:
    task.mark_started(force=True)
task._set_configuration(name="Dataset Content", config_text="")
if original_status == "completed":
    task.mark_completed(force=True)
elif original_status == "failed":
    task.mark_failed(force=True)
elif original_status == "aborted":
    task.mark_stopped(force=True)

to clear the configuration

  				
Posted 
	one year ago

					More  		
  Report
		
					SmugDolphin23
				
					0

JealousMole49 you can export tasks and import them (see None ), however this will not include the metrics

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Thank you so much SmugDolphin23 , that should help a lot!
Just to be sure I understand correctly: Removing this Content configuration only removes the previews and associated data, but leaves the dataset itself fully accessible, correct?
And since you seem quite knowledgable on the subject: Do you know if there is a way to transfer these tasks from one ClearML server instance to another (specifically from SaaS to a self-hosted instance)?

  				
Posted 
	one year ago

					More  		
  Report
		
					JealousMole49
				
					0
					 × 1

JealousMole49 how did you hit the limit? Do you see any error message? Are you using the self hosted version or the SaaS version in app.clear.ml ?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Write your answer

1K Views

13 Answers

one year ago