Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Have Set

I have set

export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true
export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=true

in my entrypoint.sh (which runs clearml-agent daemon --queue $QUEUES --create-queue --cpu-only --foreground )

but it appears that tasks still take a long time to set up environments. I expected the whole process to be skipped and for the preinstalled python deps in the docker image (which is running this entrypoint script) to be used.

From task pickup to task "run python file" can be several minutes... which is greater than some of the tasks take themselves.

  
  
Posted one year ago
Votes Newest

Answers 54


yeah... still seeing variances from 1m to 10m for the same task. been testing parallel execution for hours.

  
  
Posted one year ago

from the logs, it feels like after git clone, it spend minutes without outputting anything. @<1523701205467926528:profile|AgitatedDove14> Do you know what is the agent suppose to do after git clone ?
I guess a check that all packages is installed ? But then with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1, what is the agent doing ??

  
  
Posted one year ago

are you on clearml agent 1.8.0?

(im noticing sometimes im just missing logs such as "Running task id.." entirely)

  
  
Posted one year ago

Please refer to here None
The doc need to be a bit clearer: one require a path and not just true/false

  
  
Posted one year ago

normally when new package need to be install, it shows up in the Console tab

  
  
Posted one year ago

what if the preexisting venv is just the system python? my base image is python:3.10.10 and i just pip install all requirements in that image. Does that not avoid venv still?

it will basically create a new venv inside the container forking the existing preinistalled stuff (i.e. the new venv already has everything the python system has preinstalled)
then it will call "pip install" on all the "installed packages of the Task.
Which should just check everything is there and install nothing

If you set " CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1" it will do checks and just use the existing system python environment as is.

, I can get 50 tasks to run in the same time it takes to run a single one? i cant imagine the apiserver being a noticeable bottleneck.

50 containers on a single machine would be fine if you have enough RAM/CPU, and yes they would run concurrently.
regrading the time itself, again the spinup time of a Task should be negligible.
Pipeline tasks are not meant to be "threads" they are meant as different functions you want to run on different machines,
This means that if your pipeline is just a set of simple functions that require no cpu/gpu or IO, I'm not sure pipeline steps is the right way to go

Does that make sense?

  
  
Posted one year ago

okay that's a similar setup to mine... that's interesting.
much more in line with my expectation.

  
  
Posted one year ago

thank you!
i'll take that design into consideration.

re: CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL in "docker venv mode" im still not quite sure I understand correctly - since the agent is running in a container, as far as it is concerned it may as well be on bare-metal.

is it just that there's no way for that worker to avoid venv? (i.e. the only way to bypass venv is to use docker-mode?)

  
  
Posted one year ago

in my case using self-hosted and agent inside a docker container:
47:45 : taks foo pulled
[ git clone, pip install, check that all requirements satisfied, and nothing is downloaded]
48:16 : start training

  
  
Posted one year ago

i would love some advice on that though - should I be using services mode + docker and some max # of instances to be spinning up multiple tasks instead?

my thinking was to avoid some of the docker overhead. but i did try this approach previously and found that the container limit wasn't exactly respected.

  
  
Posted one year ago

im not running in docker mode though

hmmm that might be the first issue. it cannot skip venv creation, it can however use a pre-existing venv (but it will change it every time it installs a missing package)
so setting CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 in non docker mode has no affect

  
  
Posted one year ago

but pretty reliably some proportion of tasks still just take a much longer time. 1m - 10m is a variance i'd really like to understand.

  
  
Posted one year ago

clearml==1.12.2
clearml_agent v1.8.1rc2

  
  
Posted one year ago

what if the preexisting venv is just the system python ? my base image is python:3.10.10 and i just pip install all requirements in that image . Does that not avoid venv still?

it's good to know that in theory there's a path forward with almost zero overhead . that's what I want .

is it reasonable to expect that with sufficient workers, I can get 50 tasks to run in the same time it takes to run a single one? i cant imagine the apiserver being a noticeable bottleneck .

  
  
Posted one year ago

this bug: None

  
  
Posted one year ago

ah I see. thank you very much!

trying export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)
but I still see Environment setup completed successfully
(it is printed after Running task id )

it still takes a full 3 minutes between task pulled by worker until Running task id
is this normal? What is happening in these few minutes (besides a git pull / switch)?

  
  
Posted one year ago

from task pick-up to "git clone" is now ~30s, much better.

This is "spent" calling apt update && update install && pip install clearml-agent
if you have those preinstalled it should be quick

though as far as I understand, the recommendation is still to not run workers-in-docker like this:

if you do not want it to install anything and just use existing venv (leaving the venv as is) and if something is missing then so be it, then yes sure that the way to go

  
  
Posted one year ago

@<1523701205467926528:profile|AgitatedDove14> About why we stay on 1.12.2 : None

  
  
Posted one year ago

ha! yup. that was it exactly. I posted about it too None lol

  
  
Posted one year ago

oh yes. Using env until the next message is 2 minutes.

  
  
Posted one year ago

would those containers best be started from something in services mode?

Yes as long as the machine has enough cpu/ram
Notice that the services mode will start a second parallel Task after the first one is done setting up the env, if running with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL, with containers that have git/python/clearml-agent preinstalled it should be minimal.

or is it possible to get no-overhead with my approach of worker-inside-docker?

No do not do that, see above explanation on why CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL does not work in docker venv mode

i designed my tasks as different functions, based mostly on what metrics to report and artifacts that are best cached (and how to best leverage comparisons of tasks). they do require cpu, but not a ton.

just report a single Task as multiple "titles" then each title is it's own step, then inside the "title" they have different seriese

is there a way for me to toggle CLEARML's log level?

Try to set the python master logger base logging level

  
  
Posted one year ago

fwiw - i'm starting to wonder if there's a difference between me "resetting the task" vs cloning it.

  
  
Posted one year ago

oooh thank you, i was hoping for some sort of debugging tips like that. will do.

from a speed-of-clearing-a-queue perspective, is a services-mode queue better or worse than having many workers "always up"?

  
  
Posted one year ago

  • try with the latest RC 1.8.1rc2

, it feels like after git clone, it spend minutes without outputting anything

yeah that is odd , can you run the agent with --debug (add before the daemon command) , and then at the end of the command add --foreground
Now launch the same task on that queue, you will have a verbose log in the console.
Let us know what you see

  
  
Posted one year ago

i just ran a pipeline that took about 2h (more than half this time was just the DAG), with about a hundred tasks. i'm taking a look at them now to see what the logs show for runtimes.

  
  
Posted one year ago

hard to see with your croppout here an there ...

  
  
Posted one year ago

Hi Guys, just curious here, what's was the final issue?
Also out of curiosity, what does that mean? "1.12.2 because some bug that make fastai lag 2x" ?

  
  
Posted one year ago

I know that git clone and pip verify all installed is normal. But for some reason in Michael screenshot, I don't see those steps ...

  
  
Posted one year ago

i was having a ton of git clone issues - disabled caching entirely... wonder if that may help too.

tysm for your help! will report back soon.

  
  
Posted one year ago

So "Using env ..." take minutes without any output ?

  
  
Posted one year ago
116K Views
54 Answers
one year ago
one year ago
Tags