Continuing On

Answered

Continuing On

Continuing on https://allegroai-trains.slack.com/archives/CTK20V944/p1607012505242500
we'd like to minimize startup time for the agent-started experiments since the experiment itself can be shorter than the startup time. Like skip setting up venv, installing packages and uploading data artifacts.
The agent is running alongside the server with data.
What's the optimal agent configuration in this case?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyBeetle72
				
					0
					 × 1

Votes Newest

Answers 11

Am I right that docker_cmd should be like "docker run --mount <...> image" ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyBeetle72
				
					0
					 × 1

ah, Task.set_base_docker(docker_cmd) , I reckon

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyBeetle72
				
					0
					 × 1

Yes, exactly

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyBeetle72
				
					0
					 × 1

Thanks AgitatedDove14 !
Is there a way to programmatically set the base docker image and extra docker arguments for enqueued tasks? I'm afraid I have no access to trains.conf , and manually editing enqueued experiments in the web UI is not an option.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyBeetle72
				
					0
					 × 1

Docker cmd is basically docker image name but you can add parameters as well.
For example "Nvidia/cuda" or "Nvidia/cuda -v /mnt/data:/mnt/data"

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi MelancholyBeetle72
You mean the venv creation takes the bulk of the time, or it something else ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

This is as far as I could get.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyBeetle72
				
					0
					 × 1

If I follow the pre-built docker image option, what are the correct configurations?
Also, can the image not be pulled from dockerhub but used from the local build instead?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyBeetle72
				
					0
					 × 1

Hmm the agent's venv caching is the next thing on the to do list for the agent (post clearml release).
Currently the easiest thing is to build a new docker image with the entire "Installed packages" section and use that as the base docker image.
(The installed packages format is "requirement" compatible, so you can just use it as is when building the dockerfile)
The second option is to wait for the next clearml-agent release (probably in a couple of weeks)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Also, can the image not be pulled from dockerhub but used from the local build instead?

If you have your docker configured to pull from local artifactory, then the agent will do the same 🙂 (it is calling the docker command just like you do)

agent.default_docker.arguments: "--mount type=bind,source=$DATA_DIR,target=/data"

Notice that you are use default docker arguments in the example
If you want the mount to always be there use extra_docker_arguments :
https://github.com/allegroai/trains-agent/blob/9a3f950ac689c50ba3415c42749a4bd8059e89a7/docs/trains.conf#L121

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Maybe I could use a pre-built docker image with a mounted volume instead?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyBeetle72
				
					0
					 × 1

Write your answer

2K Views

11 Answers

4 years ago

2 years ago