Hi Everyone, I Think It Would Be Nice If

Answered

Hi everyone,
I think it would be nice if startup_bash_script that takes in bash_script for the AWS autoscalers can be completely overwritten to input custom user_data into the ec2 instance. What was the motivation for constraining the user to the startup_bash_script ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

Votes Newest

Answers 12

Do you mean the reason is that you already have all the dependencies already set up in the images you build?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi RobustRat47 , well, I believe the main reasoning was that there are some steps that must be performed in order for the agent to be able to run, and that its much too easy to mess them up 🙂 - what is your specific need (or rather, what's in your way right now)?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

What would be the use case for triggering the auto scaler programmatically? I mean, I'd imagine you'd only have one (or a few) of those running at any given time, right?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

In short we clone the repo, build the docker container, and run agent in the container. The reason we do it this, rather than provide a docker image to the clearml-agent is two fold:

We actively develop our custom networks and architectures within a containerised env to make it easy for engineers to have a quick dev cycle for new models. (same repo is cloned and we build the docker container to work inside) We use the same repo to serve models on our backend (in a slightly different container)
I guess we could build and push the container to docker and reference that image in the clearml-agent. What do you think about this workflow? They are the main reasons for not using the startup_bash_script as provided

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

yep using the community version

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

Obviously, you can put whatever you want in the startup_bash_script

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

remote execution is working now. Internal worker nodes had not spun up the agent correctly

So no issues now? 🙂

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Yes on the apps page is the possible to tigger programatically?

I assume you're using http://app.community.clear.ml ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I'd prefer to run it on the Web UI

Do you mean as an app?

Also, we seem to have problems when it's executed remotely

What sort of problems?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

By the way, you can monkey patch it pretty easily by adding your own main.py to the autoscaler, with something like:
` import aws_autoscaler

class MyAwsAutoScaler(aws_autoscaler.AwsAutoScaler):
startup_bash_script = []

aws_autoscaler.AwsAutoScaler = MyAwsAutoScaler

if name == 'main':
aws_autoscaler.main() `And than simply run your own file 😉

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Yes, it's the dependencies. At the moment I'm doing this as a work around.
` autoscaler = AwsAutoScaler(hyper_params, configurations)

startup_bash_script = [
    '...',
]

autoscaler.startup_bash_script = startup_bash_script ` I'd prefer to run it on the Web UI. Also, we seem to have problems when it's executed remotely

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

Yes on the apps page is the possible to tigger programatically?

remote execution is working now. Internal worker nodes had not spun up the agent correctly 😛

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

Write your answer

2K Views

12 Answers

4 years ago

2 years ago