Hi RobustGoldfish9 ,
I'd much rather just have trains-agent just automatically build the image defined there than have to build the image separately and make it available for all the agents to pull.
Do you mean there is no docker image in the artifactory built based on your Dockerfile ?
I might be a little confused. I'm assuming that when I set a base docker image for a task and run it, trains-agent runs a container from that image, then clones the git repository into that image, then applies all the changes/packages trains detected and runs the script. But what I'd really like to do is for trains-agent to pull a repository that includes a Dockerfile, build that dockerfile, run the resulting container, and then execute the script within it.
It's very close. The only difference is that my team does have access to decide MLOps, and we've already structured our project around docker and our environment is already defined in a Dockerfile and a requirements.txt which is acted upon inside that Dockefile. We don't really need trains to completely manage our dependencies.
trains-agent runs a container from that image, then clones ...
That is correct
I'd like the base_docker_image to not only be defined at runtime
I see, may I ask why not just build it once, push it into artifactory and then have
trains-agent use it? (it will be much faster)
I'm going to follow your suggestion and just put the extra effort into distributing a pre-built image.
That sounds good 🙂
If you feel the need is important, I do have a hack in mind, it will be doable once we have support for entrypoint "-c python_code_here". But since this is still not available I believe you are right and build an image would be the easiest.
A note on the docker image, remember that when running inside the docker we inherit the system packages installed on the docker, so if you change python packages there will not be any need to build a new image :)
Prior to trains our workflow was to build an image and spin up a container on one of our GPU machines. I can do the same thing and still take advantage of trains' excellent reporting, but then I lose out on the queues and ability to clone experiments from the webui.
I had thought of that as a solution for when our code stabilized, but during development I'd rather not have to build/maintain an image and keep updating it as code/environment diverged from it. It would be nicer if everything just got built from the Dockerfile checked out from git.