Hi PanickyMoth78 an RC with a fix is out, let me know if it works (notice you can now set the max_workers from CLI or Dataset functions) pip install clearml==1.8.1rc1
I can't find version 1.8.1rc1
but I believe I see a relevant change in code of Dataset.upload
in 1.8.1rc0
PanickyMoth78 quick update the fix is already being tested, I'm hoping an RC tomorrow 🙂
Hi PanickyMoth78
Yes i think you are correct, this looks like gs throttling your connection. You can control the number of concurrent uploads with max_worker=1
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/datasets/dataset.py#L604
Let me know if it works
maybe this line should take a timeout argument?
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/clearml/storage/helper.py#L1834
I have google-cloud-storage==2.6.0
installed
setting max_workers to 1 prevents the error (but, I assume, it may come the cost of slower sequential uploads).
This seems like a question to GS storage, maybe we should open an issue there, their backend does the rate limit
My main concern now is that this may happen within a pipeline leading to unreliable data handling.
I'm assuming the pipeline code will have max_workers, but maybe we could have a configuration value so that we can set it across all workers, wdyt?
If
Dataset.upload()
does not crash or return a success value that I can check and
Are you saying that with this error showing upload data does not crash?
If
Dataset.upload()
does not crash or return a success value that I can check and
Are you saying that with this error showing upload data does not crash? (edited)
Unfortunately that is correct. It continues as if nothing happened!
To replicate this in linux (even with max_workers=1
):
https://averagelinuxuser.com/limit-bandwidth-linux/ to throttle your connection: sudo apt-get install wondershaper
Throttle your connection to 1mb/s with something like sudo wondershaper wlo1 1024 1024
(where wlo1
is my network connection)
Run the attached script: python dataset_fail.py
(to stop throttling: sudo wondershaper clear wlo1
)
Thanks AgitatedDove14
setting max_workers to 1 prevents the error (but, I assume, it may come the cost of slower sequential uploads).
My main concern now is that this may happen within a pipeline leading to unreliable data handling.
If Dataset.upload()
does not crash or return a success value that I can check and if Dataste.get_local_copy()
also does not complain as it retrieves partial data - how will I ever know that I lost part of my dataset?
Unfortunately that is correct. It continues as if nothing happened!
oh dear, let me make sure this is taken care of
And thank you for the reproduce code!!!
Hi PanickyMoth78 ,
Can you try with pip install clearml==1.8.1rc0
? it should include a fix for this issue