Reputation
Badges 1
8 × Eureka!You mean re-creating the datasets from scratch? Since I'm using local storage, that would also mean unnecessarily copying data back and fourth only to end up with the same thing on the local disks... Is there really no other way?
I'm very surprised ClearML offers no export / import functionality of any kind...
Hi Allen,
I've ran into this exact problem myself, and simply added a function to dataset.py
in the clearml package ( clearml/datasets/dataset.py
) that takes a list of files instead of a single file.
It looks like this (I use clearml 1.13.1
):
@<1523701070390366208:profile|CostlyOstrich36> @<1523701087100473344:profile|SuccessfulKoala55> I'm still urgently looking for a solution for this as we depend on clearML to mange our datasets. Any suggestions how I could find out where my storage space went and free it up?
H @<1523701087100473344:profile|SuccessfulKoala55> , the data is stored locally for sure - we have more data than even the 100GB artifacts storage would allow. Plus I doubt that the actual dataset would be counted against the metrics storage quota, right?
Your second remark is pretty much what the email support told me before sending me here. The problem is that I still don't know how I can
a) prevent my datasets taking up so much metrics space (can I disable previews?)
b) find and remove a...
Hi Eugen, thanks for the pointers! Is there any documentation about those config values I could read so I better understand what I'm doing?
As for your suggestion for reducing the current usage, is that something I would do in the web dashboard (app.clear.ml) or through the API? I'm not sure what this Dataset Content
configuration object is exactly and where I'd have to remove it.
Thank you so much @<1523701435869433856:profile|SmugDolphin23> , that should help a lot!
Just to be sure I understand correctly: Removing this Content configuration only removes the previews and associated data, but leaves the dataset itself fully accessible, correct?
And since you seem quite knowledgable on the subject: Do you know if there is a way to transfer these tasks from one ClearML server instance to another (specifically from SaaS to a self-hosted instance)?
Yes, ideally I'd like to ensure that they are always in sync. They will be updated from time to time, adding new versions and having two separate datasets sounds like I'd always have to do this twice...
The way I wrote it is a bit of a quick fix with a lot of code duplication, I'm sure it could be implemented in a cleaner way (e.g. having only one remove_files
method that can either take a single path or a list of paths).
It's one of those things that I intended to do at some point, but never had the time to clean it up (I did a similar modification for adding lists of files, since this has exactly the same issue if you don't want to add something you can define with a wildcard but only ...
@<1523701087100473344:profile|SuccessfulKoala55> I get a warning in the ClearML Dashboard (app.clear.ml) that my metrics storage is almost full. I have no idea what happens when I reach the limit, but since we are dependent on being able to use the datasets stored in ClearML, I don't really want to find out... (Using the SaaS version)
@<1523701070390366208:profile|CostlyOstrich36> Thanks for your reply, unfortunately this is exactly the problem: I simply cannot explain that the storage filled up all of a sudden, so I have no idea what to delete. At least 2/3 of the available space was filled during a time where we didn't run any experiments whatsoever, we only added a bunch of datasets. AFAIK there are no logs/metrics/plots involved, and the data itself is always stored locally.
I tried to contact support but they where ...