Reputation
Badges 1
14 × Eureka!Just having TRAINS_AGENT_DOCKER_HOST_MOUNT defined for the agent docker wasn't enough, but after making sure the source directory was mounted from somewhere outside of docker everything works.
Thank you very much for the assistance! I don't think I would have known about that environment variable without asking here on the slack channel.
I added TRAINS_AGENT_DOCKER_HOST_MOUNT="/root/.trains:/root/.trains" to the trains-agent docker. I also started that docker with a bind-mount to make sure /root/.trains inside the agent docker pointed to somewhere on the house outside of any container.
I can see the trains-agent listed as a machine in the UI. I can also send experiments to the queue and the agent picks them up.
Yeah. We still have a requirements.txt file, but we do pip -r inside the Dockerfile along with all our calls to apt.
It's very close. The only difference is that my team does have access to decide MLOps, and we've already structured our project around docker and our environment is already defined in a Dockerfile and a requirements.txt which is acted upon inside that Dockefile. We don't really need trains to completely manage our dependencies.
I'll keep an eye out for the new entrypoint. Thanks again for the support.
Prior to trains our workflow was to build an image and spin up a container on one of our GPU machines. I can do the same thing and still take advantage of trains' excellent reporting, but then I lose out on the queues and ability to clone experiments from the webui.
I'd like the base_docker_image to not only be defined at runtime, but also be built at runtime.
I think for the time being, I'm going to follow your suggestion and just put the extra effort into distributing a pre-built image.
I might be a little confused. I'm assuming that when I set a base docker image for a task and run it, trains-agent runs a container from that image, then clones the git repository into that image, then applies all the changes/packages trains detected and runs the script. But what I'd really like to do is for trains-agent to pull a repository that includes a Dockerfile, build that dockerfile, run the resulting container, and then execute the script within it.
I had thought of that as a solution for when our code stabilized, but during development I'd rather not have to build/maintain an image and keep updating it as code/environment diverged from it. It would be nicer if everything just got built from the Dockerfile checked out from git.