I Have Set

Answered

I Have Set

I have set

export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true
export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=true

in my entrypoint.sh (which runs clearml-agent daemon --queue $QUEUES --create-queue --cpu-only --foreground )

but it appears that tasks still take a long time to set up environments. I expected the whole process to be skipped and for the preinstalled python deps in the docker image (which is running this entrypoint script) to be used.

From task pickup to task "run python file" can be several minutes... which is greater than some of the tasks take themselves.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

Votes Newest

Answers 54

sometimes I get "lucky" and see something more like what I expect... total experiment time < 1 min (and I have evidence of this happening. logs start-to-finish in sub-minute). But then other times the same task will take 5-10 minutes.

same worker, same queue, just one worker serving it... I am so utterly perplexed by the variation in how long things take. my clearml API server is running on a beefy 32 core machine and not much else is happening right now...

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

yeah... still seeing variances from 1m to 10m for the same task. been testing parallel execution for hours.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

clearml==1.12.2
clearml_agent v1.8.1rc2

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

BTW: you can also just add -e " CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1" to the docker args (under the Execution tab) to override the setting of the docker.
you can also add " export; " to the docker startup bash script section (do not add "#/bin/bash" , just the actual script) to get a list of all the environment variables inside the docker, just in case

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

i would love some advice on that though - should I be using services mode + docker and some max # of instances to be spinning up multiple tasks instead?

my thinking was to avoid some of the docker overhead. but i did try this approach previously and found that the container limit wasn't exactly respected.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

but pretty reliably some proportion of tasks still just take a much longer time. 1m - 10m is a variance i'd really like to understand.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

im not running in docker mode though - im running a clearml worker in a docker container (and then multiplying the container)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

oh it's there, before running task.

from task pick-up to "git clone" is now ~30s, much better.

though as far as I understand, the recommendation is still to not run workers-in-docker like this:

export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
  export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)

(and fwiw I have this in my entrypoint.sh )

cat <<EOF > ~/clearml.conf
agent {
    vcs_cache {
        enabled: true
    }

    package_manager: {
        type: pip,
        system_site_packages: true,
    }

}
EOF

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

I think a proper screenshot of the full log with some information redacted is the way to go. Otherwise we are just guessing in the dark

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

im not running in docker mode though

hmmm that might be the first issue. it cannot skip venv creation, it can however use a pre-existing venv (but it will change it every time it installs a missing package)
so setting CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 in non docker mode has no affect

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

i just ran a pipeline that took about 2h (more than half this time was just the DAG), with about a hundred tasks. i'm taking a look at them now to see what the logs show for runtimes.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

try with the latest RC 1.8.1rc2

, it feels like after git clone, it spend minutes without outputting anything

yeah that is odd , can you run the agent with --debug (add before the daemon command) , and then at the end of the command add --foreground
Now launch the same task on that queue, you will have a verbose log in the console.
Let us know what you see

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

from the logs, it feels like after git clone, it spend minutes without outputting anything. @<1523701205467926528:profile|AgitatedDove14> Do you know what is the agent suppose to do after git clone ?
I guess a check that all packages is installed ? But then with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1, what is the agent doing ??

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

oooh thank you, i was hoping for some sort of debugging tips like that. will do.

from a speed-of-clearing-a-queue perspective, is a services-mode queue better or worse than having many workers "always up"?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

i really dont see how this provides any additional context that the timestamps + crops dont but okay.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

We need to focus first on Why is it taking minutes to reach Using env.
In our case, we have a container that have all packages installed straight in the system, no venv in the container. Thus we don't use CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
But then when a task is pulled, I can see all the steps like git clone, a bunch of Requirement already satisfied .... There may be some odd package that need to be installed because one of our DS is experimenting ... But all that we can see what is happening.
In @<1689446563463565312:profile|SmallTurkey79> case, are you saying the log don't show anything at all ? After it pull the task: 5 minutes pass and no explanation of what those 5min been used for ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

would those containers best be started from something in services mode?

Yes as long as the machine has enough cpu/ram
Notice that the services mode will start a second parallel Task after the first one is done setting up the env, if running with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL, with containers that have git/python/clearml-agent preinstalled it should be minimal.

or is it possible to get no-overhead with my approach of worker-inside-docker?

No do not do that, see above explanation on why CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL does not work in docker venv mode

i designed my tasks as different functions, based mostly on what metrics to report and artifacts that are best cached (and how to best leverage comparisons of tasks). they do require cpu, but not a ton.

just report a single Task as multiple "titles" then each title is it's own step, then inside the "title" they have different seriese

is there a way for me to toggle CLEARML's log level?

Try to set the python master logger base logging level

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

starting to . thanks for your explanation .

would those containers best be started from something in services mode? or is it possible to get no-overhead with my approach of worker-inside-docker?

i designed my tasks as different functions, based mostly on what metrics to report and artifacts that are best cached (and how to best leverage comparisons of tasks) . they do require cpu, but not a ton.

I'm now experimenting with lumping a lot of stuff into one big task and seeing how this goes instead . i have to be more selective in the reporting of metrics and plots though .

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

from task pick-up to "git clone" is now ~30s, much better.

This is "spent" calling apt update && update install && pip install clearml-agent
if you have those preinstalled it should be quick

though as far as I understand, the recommendation is still to not run workers-in-docker like this:

if you do not want it to install anything and just use existing venv (leaving the venv as is) and if something is missing then so be it, then yes sure that the way to go

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

@<1523701205467926528:profile|AgitatedDove14> About why we stay on 1.12.2 : None

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

are you on clearml agent 1.8.0?

(im noticing sometimes im just missing logs such as "Running task id.." entirely)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

Please refer to here None
The doc need to be a bit clearer: one require a path and not just true/false

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

normally when new package need to be install, it shows up in the Console tab

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

I know that git clone and pip verify all installed is normal. But for some reason in Michael screenshot, I don't see those steps ...

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

Show more results

Write your answer

97K Views

54 Answers

one year ago