Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Reducing Docker Container Spin-Up Time With Clearml Agent

Reducing docker container spin-up time with ClearML agent
I have made a training experiment with clearML in docker container. Now I have the same image that was used to train a model on machine with agent and I use that image as a base for clearML agent . My assumption was that no packages will be installed since it's the same image that was used for initial model training.
I am very much concerned with docker container spin up time. And a thing that I was most afraid happened - when agent launched a container it started installing all the packages and it is like one hour of time...
Is there any guide line how to setup clearML/agent to reduce docker spin-up time?
On behalf of what user agent runs a docker container? The problem above could be that I used a non-root user to train a model and all packages are installed for non-root user but clearML agent runs container as a root user.

  
  
Posted 2 years ago
Votes Newest

Answers 9


Hi GentleSwallow91

I am very much concerned with docker container spin up time.

To accelerate spin up time (mostly pip install) use the venv cahing (basically it will store a cache of the entire installed venv so it oes not need to reinstall it)
Unmark this line:
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L116

The problem above could be that I used a non-root user to train a model and all packages are installed for non-root user but clearML agent runs container as a root user.

You can specify a user access folder instead of the "/root/" home folder here:
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L241

  
  
Posted 2 years ago

Hi AgitatedDove14
Thanks for the update.
Well, it's a pain... I use specifically pytorch docker image and still agent will download it?
My image is build based on FROM pytorch/pytorch:1.11.0-cuda11.3-cudnn8-devel
And a portion of agent log on top of that image:
Package(s) not found: torch Torch CUDA 113 download page found Found PyTorch version torch==1.11.0 matching CUDA version 113 Package(s) not found: torchvision Found PyTorch version torchvision==0.12.0 matching CUDA version 113 Collecting torch==1.11.0+cu113 Downloading (1637.0 MB)So there is no way to change that?

  
  
Posted 2 years ago

GentleSwallow91 how come it does not already find the correct pytorch version inside the docker ? whats the clearml-agent version you are using ?

  
  
Posted 2 years ago

clearml-agent --version CLEARML-AGENT version 1.2.3

  
  
Posted 2 years ago

and this is inside a container to check that package is installed:
docker run -it --rm torch2022 pip show torch
Name: torch Version: 1.11.0 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: /opt/conda/lib/python3.8/site-packages Requires: typing_extensions Required-by: torchmetrics, pytorch-lightning, torchvision, torchtext, torchelastic
I build my own image on top of pytorch/pytorch:1.11.0-cuda11.3-cudnn8-devel

  
  
Posted 2 years ago

hmm that is odd, it should have detected it, can you verify the issue still exists with the latest RC?
pip3 install clearml-agent==1.2.4rc3

  
  
Posted 2 years ago

no prob - will run now

  
  
Posted 2 years ago

This time it runs smoothly - here's the output:
` Local file not found [torch @ file:///home/testuser/.clearml/pip-download-cache/cu113/torch-1.11.0%2Bcu113-cp39-cp39-linux_x86_64.whl], references removed
Local file not found [torchvision @ file:///home/testuser/.clearml/pip-download-cache/cu113/torchvision-0.12.0%2Bcu113-cp39-cp39-linux_x86_64.whl], references removed
Adding venv into cache: /home/nino/.clearml/venvs-builds/3.9
Running task id [b15553c045ab4c3283bbdb040ec19f1f]:
[src/models]$ /home/testuser/.clearml/venvs-builds/3.9/bin/python -u train.py
Summary - installed python packages:
pip:
...

Environment setup completed successfully

Starting Task Execution: `

  
  
Posted 2 years ago

Woot woot!
awesome, this RC is stable you can feel free to use it, the official release is probably due to be out next week :)

  
  
Posted 2 years ago
1K Views
9 Answers
2 years ago
one year ago
Tags
Similar posts