Question About Out-Of-Python Dependencies. For Example Nltk Requires

Answered

Question about out-of-Python dependencies. For example NLTK requires apt-get install sqlite-devel how can we manage dependencies like that on the train agents?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

Votes Newest

Answers 14

Hi WackyRabbit7

If the trains-agent running docker mode, you can add it to agent.docker_init_bash_script in the ~/trains.conf file.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Is there a way to do so without touching the config? directly through the Task object?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

I'm asking that because the DSes we have are working on multiple projects, and they have only one trains.conf file, I wouldn't want them to edit it each time they switch project

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

One solution I can think about is having a different image per Task , with the apt-get packages. you can just build a new image based on the one you have with the apt-get packages (or change to one with those packages).

Another one is running more than one agent, each one with different trains.conf file, one for each project.

Currently, task object doesn’t have a parameter for installing packages when running with trains-agent .

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Hi WackyRabbit7 ,

directly through the Task object?

How will the Task object be relevant if you'd like to affect the agent? When the task is running, the agent has already started executing it...

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Am I correct in my understanding that what you'd like is for agent.docker_init_bash_script to be "aware" of the Task the agent is going to run?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

TimelyPenguin76 if I build a custom image, do I have to host it on dockerhub for it to run on the agent? If not how do I make the agent aware of my custom image?

SuccessfulKoala55 The simplest thing i can think of is on Task.execute_remotely to be able to append ot the docker_init_bash_script

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

Maybe even a dedicated argument specifically for apt-get packages, since it is very common to need stuff like that

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

I believe that is why MetaFlow chose conda as their package manager, because it can take care of these kind of dependencies (even though I hate conda 😄 )

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

if I build a custom image, do I have to host it on dockerhub for it to run on the agent?

You dont need to host it, but in this case the machine running the agent should have the image (you can verify on the machine with docker images ).

If not how do I make the agent aware of my custom image?

Once the image is the base docker image for this task, and the image was verify on the agent’s machine, the agent should be able to use it

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Okay so that is a bit complicated

In our setup, the DSes don't really care about agents, the agents are being managed by our MLops team.
So essentially if you imagine it the use case looks like that:
A data scientists wants to execute some CPU heavy task. The MLops team supplied him with a queue name, and the data scientist knows that when he needs something heavy he pushes it there - the DS doesn't know nothing about where it is executed, the execution environment is fully managed by the MLOps team. Now if the data scientists needs an apt package, he has no way to access that machine, because it is not in his domain. So as it is now, he will have either to change his trains.conf which is not ideal, because he might need that package only for a specific task, or he will have to contact an MLOps member so he would prepare a docker image for him on the remote agents.

I think, it will be very useful, to allow DSes to be able to control that on a task level - so a DS could, without the help of an MLOps member, specify a task-specific apt dependency on his own

I will open an issue about it, because this is a use case that I predict will be very common for us, there are always these annoying apt dependencies (like tkinter and other *-dev packages)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

In our team there is a similar requirement, some scripts requires external dependencies. We have built several Docker images and these can be selected within the script itself by using -
task.set_base_docker("<docker-image>:<tag>")

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UptightCoyote42
				
					0

UptightCoyote42 - How are these images avaialble to all agents? Do you host them on Docker hub?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

Docker hub is probably not a bad idea. In my case there were only two workstations so I've copied the Dockerfile and rebuilt the image

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UptightCoyote42
				
					0

Write your answer

2K Views

14 Answers

4 years ago

2 years ago