Hi Again! I Am Doing Batch Inference From A Parent Task (That Defines A Base Docker Image). However, I'Ve Encountered An Issue Where The Task Takes Several Minutes (Approximately 3-5 Minutes) Specifically When It Reaches The Stage Of "Environment Setup Co

Answered

Hi again!
I am doing batch inference from a parent task (that defines a base docker image). However, I've encountered an issue where the task takes several minutes (approximately 3-5 minutes) specifically when it reaches the stage of
"Environment setup completed successfully
Starting Task Execution:"
I'm uncertain about the cause of this delay. Could you advise if this is considered normal?
Thank you in advance.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyElephant1
				
					0
					 × 1

Votes Newest

Answers 11

Hi @<1523701205467926528:profile|AgitatedDove14> !
Thanks againg for following up this thread.

Perhaps It is not clear to read the delay in the log, but is just after "Starting Task Execution:"

Environment setup completed successfully
Starting Task Execution:

Here this new entry in the log is 2 min after env completed =>
1702378941039 box132 DEBUG 2023-12-12 11:02:16,112 - clearml.model - INFO - Selected model id: 9be79667ca644d7dbdf26732345f5415

So, the environment is created OK, but then it takes a consistent 2 min before actually starting the task.
No additional info is printed in the log during this wait time.

No worries for missing clearml-agent & git, we managed to avoid the venv creation as the docker container has the environment already set up. So I think our issue is not related to the venv set up itself, but the start after this.
We are looking around the "clearml_agent execute" command, as it is called right away after env set up...?

  				
Posted 
	10 months ago

					More
				  		
  Report
		
					CostlyElephant1
				
					0
					 × 1

Hi @<1529633468214939648:profile|CostlyElephant1>
what seems to be the issue? I could not locate anything in the log

"Environment setup completed successfully
Starting Task Execution:"

Do you mean it takes a long time to setup the environment inside the container?

CLEARML_AGENT_SKIP_PIP_VENV_INSTALL and CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL,

It seems to be working, as you can see no virtual environment is created, the only thing that is installed is the cleartml-agent that is missing form the base container

  				
Posted 
	10 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Here this new entry in the log is 2 min after env completed =>

1702378941039 box132 DEBUG 2023-12-12 11:02:16,112 - clearml.model - INFO - Selected model id: 9be79667ca644d7dbdf26732345f5415

This seems to be something in your code, just add print("starting") in your entry python file, Before any imports (because they might actually do something)
Because form the agent's perspective after printing Starting Task Execution: it literally calls the python script, nothing else ...

  				
Posted 
	10 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

However, there is still a delay of approximately 2 minutes between the completion of setup,

Where is that delay in the log?
(btw: it seems your container is missing clearml-agent & git, installing those might add some time)

  				
Posted 
	10 months ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1523701205467926528:profile|AgitatedDove14> would you be so kind to take a look at this issue?
we still have 2 minutes between the log of
"

Starting Task Execution:

"
and actually starting our inference logic. We have no extra info from the log to check to improve this slow task start time.
thanks a lot for any feedback!

  				
Posted 
	10 months ago

					More
				  		
  Report
		
					CostlyElephant1
				
					0
					 × 1

Hi @<1529633468214939648:profile|CostlyElephant1> , it looks like thats the environment setup. Can you share the full log?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Here it is @<1523701070390366208:profile|CostlyOstrich36>
Thanks for your feedback

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyElephant1
				
					0
					 × 1

Also, as can be seen in docker args, I tried using CLEARML_AGENT_SKIP_PIP_VENV_INSTALL and CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL, to avoid installing packages as the container conatins everything needed to run the task, but not sure that it had any effect.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyElephant1
				
					0
					 × 1

Hi @<1523701070390366208:profile|CostlyOstrich36>
Did you have the time to check the full log I sent you?
Most appreciated!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyElephant1
				
					0
					 × 1

Hi @<1523701205467926528:profile|AgitatedDove14>
Yes, it was indeed in our code! after looking in depth, the loading of .cu and .cpp files was the root of the issue, slowing down the batch inference. Thanks a lot for your support!!

  				
Posted 
	9 months ago

					More
				  		
  Report
		
					CostlyElephant1
				
					0
					 × 1

Hello @<1523701205467926528:profile|AgitatedDove14> , thank you for addressing my concern. It seems that the aspect of avoiding the venv is functioning correctly, and everything within the container is properly configured to initiate. However, there is still a delay of approximately 2 minutes between the completion of setup, thus the appearance of the console log indicating "Starting Task Execution" and the actual commencement of the inference logic. During this period, no additional logs are generated.

  				
Posted 
	10 months ago

					More
				  		
  Report
		
					CostlyElephant1
				
					0
					 × 1

Write your answer

800 Views

11 Answers

one year ago

9 months ago