I Have Set

Answered

I Have Set

I have set

export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true
export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=true

in my entrypoint.sh (which runs clearml-agent daemon --queue $QUEUES --create-queue --cpu-only --foreground )

but it appears that tasks still take a long time to set up environments. I expected the whole process to be skipped and for the preinstalled python deps in the docker image (which is running this entrypoint script) to be used.

From task pickup to task "run python file" can be several minutes... which is greater than some of the tasks take themselves.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

Votes Newest

Answers 54

thank you!
i'll take that design into consideration.

re: CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL in "docker venv mode" im still not quite sure I understand correctly - since the agent is running in a container, as far as it is concerned it may as well be on bare-metal.

is it just that there's no way for that worker to avoid venv? (i.e. the only way to bypass venv is to use docker-mode?)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

of what task? i'm running lots of them and benchmarking

If you are skipping every installation it should be the same
because if you set CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 it will not install Anything at all
This is why it's odd to me...
wdyt?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

"regular" worker will run one job at a time, services worker will spin multiple tasks at the same time But their setup (i.e. before running the actual task) is one at a time..

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

So "Using env ..." take minutes without any output ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

what if the preexisting venv is just the system python ? my base image is python:3.10.10 and i just pip install all requirements in that image . Does that not avoid venv still?

it's good to know that in theory there's a path forward with almost zero overhead . that's what I want .

is it reasonable to expect that with sufficient workers, I can get 50 tasks to run in the same time it takes to run a single one? i cant imagine the apiserver being a noticeable bottleneck .

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

I can see all the steps like git clone,

git clone has nothing to do with "env setup" this is brining the code, you cannot skip that one, that said, this is why the git itself is cached on the host machine, so it is fast

... There may be some odd package that need to be installed because one of our DS is experimenting ... But all that we can see what is happening.

even if everything is preinstalled, it Verifies the packages match, this might take a long time. It's just pip being pip (if you want the extreme try to do the same with conda, that one is even slower)
the output of that verification stage is no new packages are installed (otherwise good thing we checked 🙂 )
bottom line, if you want to skip the pip verification/installation pass CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1

btw: i'm checking regrading the GH issue

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

fwiw - i'm starting to wonder if there's a difference between me "resetting the task" vs cloning it.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

in my case using self-hosted and agent inside a docker container:
47:45 : taks foo pulled
[ git clone, pip install, check that all requirements satisfied, and nothing is downloaded]
48:16 : start training

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

def seeing some that took 7-8 mins whereas others 2-3...

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

hard to see with your croppout here an there ...

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

ah I see. thank you very much!

trying export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)
but I still see Environment setup completed successfully
(it is printed after Running task id )

it still takes a full 3 minutes between task pulled by worker until Running task id
is this normal? What is happening in these few minutes (besides a git pull / switch)?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

you should be able to see int the Console tab that show what is happening

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

minute of silence between first two msgs and then two more mins until a flood of logs. Basically 3 mins total before this task (which does almost nothing - just using it for testing) starts.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

i just need to understand what I should be expecting. I thought from putting it into queue in UI to "running my code remotely" (esp with packages preloaded) should be fairly fast turnaround - certainly not three minutes... i'll have to change my whole pipeline design if this is the case)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

apologies - just trying to keep sensitive data out of screenshot

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

the timestamps were all that mattered in those.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

ha! yup. that was it exactly. I posted about it too None lol

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

of what task? i'm running lots of them and benchmarking execution times. would you like to see a best case or worst case scenario? (ive kept some experiments for each).

and yeah, in those docs you just linked, "boolean" vars like CLEARML_AGENT_GIT_CLONE_VERBOSE explicitly say true so I ended up trying that pattern. but originally i did try 1. let me go back to that now. thank you.

overall I've seen some improvements in execution time using the suggestions in this thread (tysm!) - the preinstalled libs seem to be helping, though some things are still just unbearably slow (one of my larger pipelines took > 1 h to generate a DAG before even starting...).

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

yeah... still seeing variances from 1m to 10m for the same task. been testing parallel execution for hours.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

clearml==1.12.2
clearml_agent v1.8.1rc2

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

i would love some advice on that though - should I be using services mode + docker and some max # of instances to be spinning up multiple tasks instead?

my thinking was to avoid some of the docker overhead. but i did try this approach previously and found that the container limit wasn't exactly respected.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

but pretty reliably some proportion of tasks still just take a much longer time. 1m - 10m is a variance i'd really like to understand.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

im not running in docker mode though - im running a clearml worker in a docker container (and then multiplying the container)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

oh it's there, before running task.

from task pick-up to "git clone" is now ~30s, much better.

though as far as I understand, the recommendation is still to not run workers-in-docker like this:

export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
  export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)

(and fwiw I have this in my entrypoint.sh )

cat <<EOF > ~/clearml.conf
agent {
    vcs_cache {
        enabled: true
    }

    package_manager: {
        type: pip,
        system_site_packages: true,
    }

}
EOF

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

I think a proper screenshot of the full log with some information redacted is the way to go. Otherwise we are just guessing in the dark

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

im not running in docker mode though

hmmm that might be the first issue. it cannot skip venv creation, it can however use a pre-existing venv (but it will change it every time it installs a missing package)
so setting CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 in non docker mode has no affect

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

i just ran a pipeline that took about 2h (more than half this time was just the DAG), with about a hundred tasks. i'm taking a look at them now to see what the logs show for runtimes.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

try with the latest RC 1.8.1rc2

, it feels like after git clone, it spend minutes without outputting anything

yeah that is odd , can you run the agent with --debug (add before the daemon command) , and then at the end of the command add --foreground
Now launch the same task on that queue, you will have a verbose log in the console.
Let us know what you see

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

i really dont see how this provides any additional context that the timestamps + crops dont but okay.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

would those containers best be started from something in services mode?

Yes as long as the machine has enough cpu/ram
Notice that the services mode will start a second parallel Task after the first one is done setting up the env, if running with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL, with containers that have git/python/clearml-agent preinstalled it should be minimal.

or is it possible to get no-overhead with my approach of worker-inside-docker?

No do not do that, see above explanation on why CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL does not work in docker venv mode

i designed my tasks as different functions, based mostly on what metrics to report and artifacts that are best cached (and how to best leverage comparisons of tasks). they do require cpu, but not a ton.

just report a single Task as multiple "titles" then each title is it's own step, then inside the "title" they have different seriese

is there a way for me to toggle CLEARML's log level?

Try to set the python master logger base logging level

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Show more results

Write your answer

97K Views

54 Answers

one year ago