Reputation
Badges 1
14 × Eureka!I have a rate limit of 600 requests per minute and I was running into it even with a single worker. And 503 is sort of only part of the issue, I suppose it's rather related, but, the bigger issue (could be caused by what causes the 503 though) is that it evidently is rejected by the host when it checks if the file exists and thus it throw that log message about not being able to list/find a file, however, the file is actually there, it just seems that the server refuses to respond (likely due...
To be clear this mostly occurred because of probably slightly unintended use of clearml, but if you remember, previously I had trouble adding external files because of how many requests the .exists()
and .get_metadata()
calls were sending to the server. Now, there's a way to list files with a common prefix in a bucket in batches, so, less requests, more files. Sending these requests also returns the metadata, so essentially I skipped .add_external_files
entirely and created ...
Maybe there's a way to pass some additional config stuffs to boto3 client? Perhaps, change the retry mode to this adaptive one? None
Also, this was not happening when adding fewer files, at the time of constantly running into this issue, I was trying to add 1.7M files in a single call to add_external_files
, then I tried in batches of 100k, but still, it failed to list some of the files (that were actually there), now I'm running in batches of 10k which seems to work fine (at least for now), however, it is rather slow, it takes about 20 minutes to upload those 10k and I have about 170 batches.
Also even using AWS_MAX_ATTEMPTS
and AWS_RETRY_MODE
did not help, I had set MAX_ATTEMPTS
to 1024 and it still failed, so, I would assume that this boto3 configuration unfortunately doesn't really help, really at all? Maybe because the adaptive mode that I was using is still technically experimental so it wasn't really doing anything, I don't know, I just know that it fails
So, I monkey patched this fix into my code, however, that still did not help, so frankly I have just made it to try again within the _add_external_files
method that I'm patching to just check again and list files again if it fails. I think that would be also something that you could add, retries into the _add_external_files
method itself, so that it retries calling StorageManager.exists_file
because that appears to be the main point of failure in this case. I mean, not a failure c...
@<1523701205467926528:profile|AgitatedDove14> my point is that it's not documented anywhere that I can find, so when requesting a 1000 entries and not getting those, it was assumed that there were no more entries to request, whereas in reality it was just capping out at 500 entries.
@<1523701435869433856:profile|SmugDolphin23> Thanks for the response! Configuring those env vars seems to help, but even with the adaptive mode and 32 or 64 max attempts it still happens to fail at some point. Granted I was using 8 workers and uploading all 1.7M in a single call to add_external_files
, but I would have expected the adaptive mode to, well, adapt to that, especially with that many attempts. Currently I'm back to sending them in batches, this time in batches of 50k files, s...
It appears to be a twitter card embed, it will show up like this wherever that is supported. It's definitely something that can be fixed, it's probably not worthwhile, but it might help down the road, it at least would be more obvious what the link is pointing to without having to read the whole link.
Hi @<1523701070390366208:profile|CostlyOstrich36> , mainly I'm looking for the user's name. I just want to get the Created By
data from a task. I can get the user's ID from a task, I'm just unsure how to proceed with getting their name.
I'm afraid I don't really know, you could check out the user settings in the UI, top right corner on the user icon, maybe there is something there, but I didn't really find anything to that extent there myself.
Thanks again 🙌 , I managed to get the user's name like so
from clearml.backend_api.session.client import APIClient
client = APIClient()
task = client.tasks.get...
user_response = client.session.send_request(
"users", "get_by_id", json={"user": task.user}
).json()
user_name = user_response["data"]["user"]["name"]
For my use case I poll tasks.get_all
which returns an object that contains the user ID: None
I will then assume that there is a users.get_by_id
as well? Fingers crossed! Thanks!
That said it would be great if they could add it to the documentation.
Why are you using query parameters? The documentation shows that you should be using the request body for all that. None
Are you using page_size
and page
keys? page
should be incremented by 1
regardless of page size, then just check if the response contains less than those 500 responses, then you can break out.