SuccessfulKoala55 , how do I set the agent version when creating the autoscaler?
Hi SkinnyPanda43
This issue was fixed with clearml-agent 1.5.1, can you verify?
AgitatedDove14 , here follows the full log:
Hi @<1523701875835146240:profile|SkinnyPanda43> , sorry for the delay, we're just in the process of upgrading this in app.clear.ml
Actually, this error happens when a launch the autoscaler from the Web UI, when I enqueue a task, it launches an EC2 instance which "Status Check" stays in "Pending" for over 15 minutes and then the instance is terminated by the scaler that launches another one in a loop.
I am launching through the UI, "XXX workspace / https://app.clear.ml/applications / AWS Autoscaler".
I think not, I have not set any ENV variable. Just went to the web UI, added an autoscaler, filled the data in the UI and launched the autoscaler.
By inspecting the scaler task, it is running the following docker image: allegroai/clearml-agent-services-app:app-1.1.1-47
Based on what I see when the ec2 instance starts it installs the latest, could it be this instance is still running?
Hi SkinnyPanda43
Can you attache the full log?
Clearml agent is installed before your requirements.txt , at least in theory it should not collide
Hi @<1523701875835146240:profile|SkinnyPanda43> , it should be OK now - please try it out and let me know
Hi, AgitatedDove14
How do I set the version to 1.5.1,? When I launch the autoscaler the version 1.5.0 is picked by default.
Hmm let me check first when it is going to upgraded and if there is a workaround
Hmm how do you launch the autoscaler, code?
SkinnyPanda43 I've identified the reason for the forced RC version - we'll remove this constraint and I'll ping you here when you can try again
SkinnyPanda43 can you please try agent v1.5.1rc0 and let me know if it changes anything?
Basically, I am following the steps in this video:
https://www.youtube.com/watch?v=j4XVMAaUt3E
So It could be launched by the clearml cli? I can also try that.
Any chance there is an env variable you set to get 1.5.0rc0? Because this is the version that is being used