Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Reducing Docker Container Spin-Up Time With Clearml Agent

Reducing docker container spin-up time with ClearML agent
I have made a training experiment with clearML in docker container. Now I have the same image that was used to train a model on machine with agent and I use that image as a base for clearML agent . My assumption was that no packages will be installed since it's the same image that was used for initial model training.
I am very much concerned with docker container spin up time. And a thing that I was most afraid happened - when agent launched a container it started installing all the packages and it is like one hour of time...
Is there any guide line how to setup clearML/agent to reduce docker spin-up time?
On behalf of what user agent runs a docker container? The problem above could be that I used a non-root user to train a model and all packages are installed for non-root user but clearML agent runs container as a root user.

  
  
Posted one year ago
Votes Newest

Answers 9


hmm that is odd, it should have detected it, can you verify the issue still exists with the latest RC?
pip3 install clearml-agent==1.2.4rc3

  
  
Posted one year ago

no prob - will run now

  
  
Posted one year ago

Hi GentleSwallow91

I am very much concerned with docker container spin up time.

To accelerate spin up time (mostly pip install) use the venv cahing (basically it will store a cache of the entire installed venv so it oes not need to reinstall it)
Unmark this line:
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L116

The problem above could be that I used a non-root user to train a model and all packages are installed for non-root user but clearML agent runs container as a root user.

You can specify a user access folder instead of the "/root/" home folder here:
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L241

  
  
Posted one year ago

Hi AgitatedDove14
Thanks for the update.
Well, it's a pain... I use specifically pytorch docker image and still agent will download it?
My image is build based on FROM pytorch/pytorch:1.11.0-cuda11.3-cudnn8-devel
And a portion of agent log on top of that image:
Package(s) not found: torch Torch CUDA 113 download page found Found PyTorch version torch==1.11.0 matching CUDA version 113 Package(s) not found: torchvision Found PyTorch version torchvision==0.12.0 matching CUDA version 113 Collecting torch==1.11.0+cu113 Downloading (1637.0 MB)So there is no way to change that?

  
  
Posted one year ago

and this is inside a container to check that package is installed:
docker run -it --rm torch2022 pip show torch
Name: torch Version: 1.11.0 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: /opt/conda/lib/python3.8/site-packages Requires: typing_extensions Required-by: torchmetrics, pytorch-lightning, torchvision, torchtext, torchelastic
I build my own image on top of pytorch/pytorch:1.11.0-cuda11.3-cudnn8-devel

  
  
Posted one year ago

GentleSwallow91 how come it does not already find the correct pytorch version inside the docker ? whats the clearml-agent version you are using ?

  
  
Posted one year ago

This time it runs smoothly - here's the output:
` Local file not found [torch @ file:///home/testuser/.clearml/pip-download-cache/cu113/torch-1.11.0%2Bcu113-cp39-cp39-linux_x86_64.whl], references removed
Local file not found [torchvision @ file:///home/testuser/.clearml/pip-download-cache/cu113/torchvision-0.12.0%2Bcu113-cp39-cp39-linux_x86_64.whl], references removed
Adding venv into cache: /home/nino/.clearml/venvs-builds/3.9
Running task id [b15553c045ab4c3283bbdb040ec19f1f]:
[src/models]$ /home/testuser/.clearml/venvs-builds/3.9/bin/python -u train.py
Summary - installed python packages:
pip:
...

Environment setup completed successfully

Starting Task Execution: `

  
  
Posted one year ago

clearml-agent --version CLEARML-AGENT version 1.2.3

  
  
Posted one year ago

Woot woot!
awesome, this RC is stable you can feel free to use it, the official release is probably due to be out next week :)

  
  
Posted one year ago
609 Views
9 Answers
one year ago
one year ago
Tags
Similar posts