Hi RobustGoldfish9 ,
I'd much rather just have trains-agent just automatically build the image defined there than have to build the image separately and make it available for all the agents to pull.
Do you mean there is no docker image in the artifactory built based on your Dockerfile ?
I might be a little confused. I'm assuming that when I set a base docker image for a task and run it, trains-agent runs a container from that image, then clones the git repository into that image, then applies all the changes/packages trains detected and runs the script. But what I'd really like to do is for trains-agent to pull a repository that includes a Dockerfile, build that dockerfile, run the resulting container, and then execute the script within it.
I'd like the base_docker_image to not only be defined at runtime, but also be built at runtime.
trains-agent runs a container from that image, then clones ...
That is correct
I'd like the base_docker_image to not only be defined at runtime
I see, may I ask why not just build it once, push it into artifactory and then have trains-agent
use it? (it will be much faster)
I had thought of that as a solution for when our code stabilized, but during development I'd rather not have to build/maintain an image and keep updating it as code/environment diverged from it. It would be nicer if everything just got built from the Dockerfile checked out from git.
I see, would having this feature solve it (i.e. base docker + bash init script)?
https://github.com/allegroai/trains/issues/236
It's very close. The only difference is that my team does have access to decide MLOps, and we've already structured our project around docker and our environment is already defined in a Dockerfile and a requirements.txt which is acted upon inside that Dockefile. We don't really need trains to completely manage our dependencies.
Prior to trains our workflow was to build an image and spin up a container on one of our GPU machines. I can do the same thing and still take advantage of trains' excellent reporting, but then I lose out on the queues and ability to clone experiments from the webui.
RobustGoldfish9 I see.
So in theory spinning an experiment on an gent would be clone code -> build docker -> mount code -> execute code inside docker?
(no need for requirements etc.?)
Yeah. We still have a requirements.txt file, but we do pip -r inside the Dockerfile along with all our calls to apt.
I think for the time being, I'm going to follow your suggestion and just put the extra effort into distributing a pre-built image.
I'm going to follow your suggestion and just put the extra effort into distributing a pre-built image.
That sounds good 🙂
If you feel the need is important, I do have a hack in mind, it will be doable once we have support for entrypoint "-c python_code_here". But since this is still not available I believe you are right and build an image would be the easiest.
A note on the docker image, remember that when running inside the docker we inherit the system packages installed on the docker, so if you change python packages there will not be any need to build a new image :)
I'll keep an eye out for the new entrypoint. Thanks again for the support.