I Have Set

Answered

I Have Set

I have set

export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true
export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=true

in my entrypoint.sh (which runs clearml-agent daemon --queue $QUEUES --create-queue --cpu-only --foreground )

but it appears that tasks still take a long time to set up environments. I expected the whole process to be skipped and for the preinstalled python deps in the docker image (which is running this entrypoint script) to be used.

From task pickup to task "run python file" can be several minutes... which is greater than some of the tasks take themselves.

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

Votes Newest

Answers 54

fwiw - i'm starting to wonder if there's a difference between me "resetting the task" vs cloning it.

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

sometimes I get "lucky" and see something more like what I expect... total experiment time < 1 min (and I have evidence of this happening. logs start-to-finish in sub-minute). But then other times the same task will take 5-10 minutes.

same worker, same queue, just one worker serving it... I am so utterly perplexed by the variation in how long things take. my clearml API server is running on a beefy 32 core machine and not much else is happening right now...

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

yeah, still noticing that it can be multiple minutes before something starts...
like... what is happening in this time (besides a git clone), now that I set both

export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true
export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)

update: it's now been six mins and the task still isn't done. this should have run through in like a minute total end-to-end

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

I'm just working on speeding up the time from "queue experiment" to "my code actually runs remotely" - as of yesterday things would sit for many minutes at a time. trying to see if venv is the culprit .

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

Hi Guys, just curious here, what's was the final issue?
Also out of curiosity, what does that mean? "1.12.2 because some bug that make fastai lag 2x" ?

  				
Posted 
	10 months ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ha! yup. that was it exactly. I posted about it too None lol

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

this bug: None

  				
Posted 
	10 months ago

					More  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

i was having a ton of git clone issues - disabled caching entirely... wonder if that may help too.

tysm for your help! will report back soon.

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

1.12.2 because some bug that make fastai lag 2x
1.8.1rc2 because it fix an annoying git clone bug

  				
Posted 
	10 months ago

					More  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

clearml==1.12.2
clearml_agent v1.8.1rc2

  				
Posted 
	10 months ago

					More  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

are you on clearml agent 1.8.0?

(im noticing sometimes im just missing logs such as "Running task id.." entirely)

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

okay that's a similar setup to mine... that's interesting.
much more in line with my expectation.

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

in my case using self-hosted and agent inside a docker container:
47:45 : taks foo pulled
[ git clone, pip install, check that all requirements satisfied, and nothing is downloaded]
48:16 : start training

  				
Posted 
	10 months ago

					More  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

i just need to understand what I should be expecting. I thought from putting it into queue in UI to "running my code remotely" (esp with packages preloaded) should be fairly fast turnaround - certainly not three minutes... i'll have to change my whole pipeline design if this is the case)

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

oh yes. Using env until the next message is 2 minutes.

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

apologies - just trying to keep sensitive data out of screenshot

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

So "Using env ..." take minutes without any output ?

  				
Posted 
	10 months ago

					More  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

the timestamps were all that mattered in those.

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

hard to see with your croppout here an there ...

  				
Posted 
	10 months ago

					More  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

minute of silence between first two msgs and then two more mins until a flood of logs. Basically 3 mins total before this task (which does almost nothing - just using it for testing) starts.

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

normally when new package need to be install, it shows up in the Console tab

  				
Posted 
	10 months ago

					More  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

you should be able to see int the Console tab that show what is happening

  				
Posted 
	10 months ago

					More  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

ah I see. thank you very much!

trying export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)
but I still see Environment setup completed successfully
(it is printed after Running task id )

it still takes a full 3 minutes between task pulled by worker until Running task id
is this normal? What is happening in these few minutes (besides a git pull / switch)?

  				
Posted 
	10 months ago

					More  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

Please refer to here None
The doc need to be a bit clearer: one require a path and not just true/false

  				
Posted 
	10 months ago

					More  		
  Report
		
					ManiacalLizard2
				
					0
					 × 1

Show more results

Write your answer

53K Views

54 Answers

10 months ago