In short we clone the repo, build the docker container, and run agent in the container. The reason we do it this, rather than provide a docker image to the clearml-agent is two fold:
We actively develop our custom networks and architectures within a containerised env to make it easy for engineers to have a quick dev cycle for new models. (same repo is cloned and we build the docker container to work inside) We use the same repo to serve models on our backend (in a slightly different container)
I guess we could build and push the container to docker and reference that image in the clearml-agent. What do you think about this workflow? They are the main reasons for not using the startup_bash_script as provided
What would be the use case for triggering the auto scaler programmatically? I mean, I'd imagine you'd only have one (or a few) of those running at any given time, right?
Do you mean the reason is that you already have all the dependencies already set up in the images you build?
By the way, you can monkey patch it pretty easily by adding your own main.py
to the autoscaler, with something like:
` import aws_autoscaler
class MyAwsAutoScaler(aws_autoscaler.AwsAutoScaler):
startup_bash_script = []
aws_autoscaler.AwsAutoScaler = MyAwsAutoScaler
if name == 'main':
aws_autoscaler.main() `And than simply run your own file ๐
remote execution is working now. Internal worker nodes had not spun up the agent correctlyย
So no issues now? ๐
I'd prefer to run it on the Web UI
Do you mean as an app?
Also, we seem to have problems when it's executed remotely
What sort of problems?
Hi RobustRat47 , well, I believe the main reasoning was that there are some steps that must be performed in order for the agent to be able to run, and that its much too easy to mess them up ๐ - what is your specific need (or rather, what's in your way right now)?
Yes, it's the dependencies. At the moment I'm doing this as a work around.
` autoscaler = AwsAutoScaler(hyper_params, configurations)
startup_bash_script = [
'...',
]
autoscaler.startup_bash_script = startup_bash_script ` I'd prefer to run it on the Web UI. Also, we seem to have problems when it's executed remotely
Yes on the apps page is the possible to tigger programatically?
I assume you're using http://app.community.clear.ml ?
Obviously, you can put whatever you want in the startup_bash_script
Yes on the apps page is the possible to tigger programatically?
remote execution is working now. Internal worker nodes had not spun up the agent correctly ๐