Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Have Set

I have set

export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true
export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=true

in my entrypoint.sh (which runs clearml-agent daemon --queue $QUEUES --create-queue --cpu-only --foreground )

but it appears that tasks still take a long time to set up environments. I expected the whole process to be skipped and for the preinstalled python deps in the docker image (which is running this entrypoint script) to be used.

From task pickup to task "run python file" can be several minutes... which is greater than some of the tasks take themselves.

  
  
Posted 4 months ago
Votes Newest

Answers 54


of what task? i'm running lots of them and benchmarking execution times. would you like to see a best case or worst case scenario? (ive kept some experiments for each).

and yeah, in those docs you just linked, "boolean" vars like CLEARML_AGENT_GIT_CLONE_VERBOSE explicitly say true so I ended up trying that pattern. but originally i did try 1. let me go back to that now. thank you.

overall I've seen some improvements in execution time using the suggestions in this thread (tysm!) - the preinstalled libs seem to be helping, though some things are still just unbearably slow (one of my larger pipelines took > 1 h to generate a DAG before even starting...).

  
  
Posted 4 months ago

of what task? i'm running lots of them and benchmarking

If you are skipping every installation it should be the same
because if you set CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 it will not install Anything at all
This is why it's odd to me...
wdyt?

  
  
Posted 4 months ago

but pretty reliably some proportion of tasks still just take a much longer time. 1m - 10m is a variance i'd really like to understand.

  
  
Posted 4 months ago

what if the preexisting venv is just the system python? my base image is python:3.10.10 and i just pip install all requirements in that image. Does that not avoid venv still?

it will basically create a new venv inside the container forking the existing preinistalled stuff (i.e. the new venv already has everything the python system has preinstalled)
then it will call "pip install" on all the "installed packages of the Task.
Which should just check everything is there and install nothing

If you set " CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1" it will do checks and just use the existing system python environment as is.

, I can get 50 tasks to run in the same time it takes to run a single one? i cant imagine the apiserver being a noticeable bottleneck.

50 containers on a single machine would be fine if you have enough RAM/CPU, and yes they would run concurrently.
regrading the time itself, again the spinup time of a Task should be negligible.
Pipeline tasks are not meant to be "threads" they are meant as different functions you want to run on different machines,
This means that if your pipeline is just a set of simple functions that require no cpu/gpu or IO, I'm not sure pipeline steps is the right way to go

Does that make sense?

  
  
Posted 4 months ago

i would love some advice on that though - should I be using services mode + docker and some max # of instances to be spinning up multiple tasks instead?

my thinking was to avoid some of the docker overhead. but i did try this approach previously and found that the container limit wasn't exactly respected.

  
  
Posted 4 months ago

I think a proper screenshot of the full log with some information redacted is the way to go. Otherwise we are just guessing in the dark

  
  
Posted 4 months ago

I know that git clone and pip verify all installed is normal. But for some reason in Michael screenshot, I don't see those steps ...

  
  
Posted 4 months ago

i really dont see how this provides any additional context that the timestamps + crops dont but okay.

  
  
Posted 4 months ago

from the logs, it feels like after git clone, it spend minutes without outputting anything. @<1523701205467926528:profile|AgitatedDove14> Do you know what is the agent suppose to do after git clone ?
I guess a check that all packages is installed ? But then with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1, what is the agent doing ??

  
  
Posted 4 months ago

im not running in docker mode though - im running a clearml worker in a docker container (and then multiplying the container)

  
  
Posted 4 months ago

i just ran a pipeline that took about 2h (more than half this time was just the DAG), with about a hundred tasks. i'm taking a look at them now to see what the logs show for runtimes.

  
  
Posted 4 months ago

okay that's a similar setup to mine... that's interesting.
much more in line with my expectation.

  
  
Posted 4 months ago

are you on clearml agent 1.8.0?

(im noticing sometimes im just missing logs such as "Running task id.." entirely)

  
  
Posted 4 months ago

i was having a ton of git clone issues - disabled caching entirely... wonder if that may help too.

tysm for your help! will report back soon.

  
  
Posted 4 months ago

thank you!
i'll take that design into consideration.

re: CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL in "docker venv mode" im still not quite sure I understand correctly - since the agent is running in a container, as far as it is concerned it may as well be on bare-metal.

is it just that there's no way for that worker to avoid venv? (i.e. the only way to bypass venv is to use docker-mode?)

  
  
Posted 4 months ago

1.12.2 because some bug that make fastai lag 2x
1.8.1rc2 because it fix an annoying git clone bug

  
  
Posted 4 months ago

i just need to understand what I should be expecting. I thought from putting it into queue in UI to "running my code remotely" (esp with packages preloaded) should be fairly fast turnaround - certainly not three minutes... i'll have to change my whole pipeline design if this is the case)

  
  
Posted 4 months ago

Please refer to here None
The doc need to be a bit clearer: one require a path and not just true/false

  
  
Posted 4 months ago

ha! yup. that was it exactly. I posted about it too None lol

  
  
Posted 4 months ago

BTW: you can also just add -e " CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1" to the docker args (under the Execution tab) to override the setting of the docker.
you can also add " export; " to the docker startup bash script section (do not add "#/bin/bash" , just the actual script) to get a list of all the environment variables inside the docker, just in case

  
  
Posted 4 months ago

oooh thank you, i was hoping for some sort of debugging tips like that. will do.

from a speed-of-clearing-a-queue perspective, is a services-mode queue better or worse than having many workers "always up"?

  
  
Posted 4 months ago

I'm just working on speeding up the time from "queue experiment" to "my code actually runs remotely" - as of yesterday things would sit for many minutes at a time. trying to see if venv is the culprit .

  
  
Posted 4 months ago

oh yes. Using env until the next message is 2 minutes.

  
  
Posted 4 months ago

in my case using self-hosted and agent inside a docker container:
47:45 : taks foo pulled
[ git clone, pip install, check that all requirements satisfied, and nothing is downloaded]
48:16 : start training

  
  
Posted 4 months ago
11K Views
54 Answers
4 months ago
4 months ago
Tags