Unanswered
Hi We Just Got The Aws Autoscaler To Create A New Instance When You Enqueue A Task To The Relevant Queue. However, For Some Reason The Task Itself Is Never Run, It Stays In The Pending State. When Looking At The Worker Details, It Says "No Queues Curren
So I've gotten a little further by tweaking my VPC setup. I see that the autoscaler spins up new instances w/o a public ip so that means specifying a private subnet and having a NAT gateway on the subnet right? I tried running the scikit-learn joblib sample on a CPU instance, and the experiment definitely ran but I see another error. Could there be some permissions missing somewhere?
2023-04-03 15:01:00
2023-04-03 13:00:51,087 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /_ApplicationInstances/aws-autoscaler/ClearML Autoscaler.f302dff7cab7446f9bfeaf923622a2d0/artifacts/i-017eb35a20bd622b4/i-017eb35a20bd622b4.txt (429):
2023-04-03 13:00:51,087 - clearml.metrics - WARNING - Failed uploading to
(Failed uploading object /_ApplicationInstances/aws-autoscaler/ClearML Autoscaler.f302dff7cab7446f9bfeaf923622a2d0/artifacts/i-017eb35a20bd622b4/i-017eb35a20bd622b4.txt (429): )
2023-04-03 13:00:51,087 - clearml.metrics - ERROR - Not uploading 1/1 events because the data upload failed
181 Views
0
Answers
one year ago
one year ago