Alright, thank you for your insight @<1523701205467926528:profile|AgitatedDove14> ! I will check this link.
Regarding S3, that's a very good point, but the team I work with currently doesn't want to leverage an external cloud storage provider.
Hi @<1663354518726774784:profile|CrookedSeal85>
I am trying to optimize storage on my ClearML file server when doing a lot of experiments.
This is not straight forward, you will need to get a list of all the events via
None
filter on image events
and then delete the the URL you are getting via the StorageManager.
But to be honest, why not just direct it to S3 or something like that ?
Hello @<1523701205467926528:profile|AgitatedDove14> ,
Good news! It seems that using the list of URLs retrieved via "POST /events.get_task_events" and then deleting the corresponding images using StorageHelper
class effectively does the trick! ๐ FYI, here is the little function I wrote for deleting the files:
from clearml.storage.helper import StorageHelper
def delete_image_from_clearml_server(image_url: str) -> None:
storage_helper = StorageHelper.get(url=image_url)
try:
storage_helper.delete(path=image_url)
except Exception as e:
raise ValueError(f"Could not remove image with URL '{image_url}': {e}")
Now, even if this function with StorageHelper
works fine, I observed that some images are not referenced in "/events.get_task_events" event list (even if I wait some time) ๐ฅ . In fact, they physically exist on the server, but no URL points towards those. This implies that those few images are not detected and hence not deleted.
Are you aware of a limitation of "/events.get_task_events" preventing from fetching some of the images stored on the server? ๐ค
Here is a picture and a video illustrating the fact that 18 images were effectively deleted, but 2 have not been listed in the events of "/events.get_task_events" and were hence not deleted...
Thank you very much for your insight!
Have a nice weekend ๐
However, regarding your recommendation of using
StorageManager
class to delete the URL, it seems that this class only contains methods for checking existence of files, downloading files and uploading files, but
no method
for actually
deleting
files based on their URL (see doc
and
).
Yes you are correct ๐ you should use a "deeper" class:
helper = StorageHelper.get(remote_url)
helper.delete(remote_url)
Hi @<1523701205467926528:profile|AgitatedDove14> ,
Thanks a lot for your recommendation, that's exactly that! ๐คฉ
I was able to use the scroll_id
of the current "page" to access to events of the next "page"!
This works fine and I can now delete almost all debug samples.
I say "almost" because, apparently, using this technique of the scroll_id
systematically does not allow to access to the events of the very last "page"...
In fact, as you can see on the picture below โคต , I have a total of 2014 events. I can trouble-free access to the events of the first, second and third "pages" (with respectively 500, 506 and 507 events) but unfortunately, providing the scroll_id
value of the third "page", I cannot access to the remaining 2014-(500+506+507) = 501 events of the very last "page".
As you can see, I get following error:
Traceback (most recent call last):\n File "/opt/clearml/apiserver/service_repo/service_repo.py", line 288, in handle_call\n ret = endpoint.func(call, company, call.data_model)\n File "/opt/clearml/apiserver/services/events.py", line 382, in get_task_events\n res = event_bll.events_iterator.get_task_events(\n File "/opt/clearml/apiserver/bll/event/events_iterator.py", line 51, in get_task_events\n res.events, res.total_events = self._get_events(\n File "/opt/clearml/apiserver/bll/event/events_iterator.py", line 132, in _get_events\n "must": must + [{"term": {key.field: events[-1][key.field]}}]\nKeyError: 'iter'\n
This error suggests that there's an issue with accessing a key named iter
within the code handling the pagination. It seems to be related to ClearML API server code itself.
Have you ever encountered such a KeyError
issue? Would you also expect using scroll_id
until the very last "page" to fetch the very last remaining data?
Again, thank you very much for your recommendation and help! ๐
Hi @<1523701205467926528:profile|AgitatedDove14> ,
Thanks again for your insight.
I see how to retrieve the URLs via " POST /events.get_task_events ".
However, regarding your recommendation of using StorageManager
class to delete the URL, it seems that this class only contains methods for checking existence of files, downloading files and uploading files, but no method for actually deleting files based on their URL (see doc here and here ).
What do you have in mind when saying:
delete the URL you are getting via the StorageManager
Are you sure this feature exists?
Thank you very much again for your support! ๐
Notice that you need to pass the returned scroll_id to the next call
scroll_id = response["scroll_id"]
Okay thank you for your snippet @<1523701205467926528:profile|AgitatedDove14> ๐ , I will investigate this class! ๐ ๐
You're right yes ๐ , and this is precisely what I do ๐ . But when trying to access the fourth "page" with scroll_id
returned on the third "page" I get above error and I am not able to access data on that fourth "page". This seems to be systematic: Using the scroll_id
of the penultimate "page" doesn't allow to access to the very last "page" ๐ค .
I debugged using my browser and following URLs (based on the scheme "api_server" + "/events.get_task_events" + "?task=" + "<my-task_id>" + "&scroll_id=" + "<scroll_id-of-the-previous-page>") to see if I can access the events:
- โ First page: duck.erx:8008/events.get_task_events?task=41d606f6bd274d7e8c1297b50507b8a9
- โ Second page: None
- โ Third page: None
- โ Fourth (and final) page (with
KeyError
and impossibility to access remaining events): None
I my Python code, I use therequests
package providing followingparams
(containing thescroll_id
I iteratively retrieve) to therequests.get()
function:
params = {"scroll_id": scroll_id}
Same observation here as well: It works fine until reaching the very last "page" where the scroll_id
of the penultimate "page" doesn't allow to access the data on this very last page ๐ .
I am not sure, but I suppose there is an issue in ClearML API file "clearml/apiserver/bll/event/events_iterator.py" ๐ค , what do you think?
Nice!!!
Are you aware of a limitation of "/events.get_task_events" preventing from fetching some of the images stored on the server
Are you saying you see them in the UI, but cannot access them via the API ?
(this would be strange as the UI is firing the same API requests to the back end)
None
notice there is a scroll_id there, you might need to call the API multiple times until you scroll over All the events
could that be it?
Thank you! ๐ ๐
Are you saying you see them in the UI, but cannot access them via the API ?
Yes, that's it! As you can see in the video above โคด , I can see the remaining images (i.e., the images that haven't been deleted) both in the UI and physically on my disk storage, but cannot access them via the API (their leading URL does not exist).
(this would be strange as the UI is firing the same API requests to the back end)
And yes, this is strange but it's what I think! ๐ฒ A few remaining images cannot be accessed via the API ๐ .
I can't prove it easily, but while debugging my code snippet I listed all images accessible via the API and the remaining images are precisely those that do not appear in the API event list (I was not able to find them on the API).
In other words, as you can see in the picture below โคต , some of the events contain one JPEG image URL (and that's fine ๐ , I could retrieve each of those image URLs to delete the corresponding image from the server โ
), but unfortunately no event contains the URL that could have lead to the remaining few images.
Consequently, since the API doesn't seem to be aware of the existence of those images, those images cannot be accessed and hence cannot be deleted using the API. They simply remain on the server and are still visible in the UI after running my code.
This is why I wanted to ask if you ever encountered such a limitation of ClearML API with "events.get_task_events" service, or what we could do to avoid omitting those few remaining images on the server ๐ค .
Thank you again for your precious support! ๐