Hi @<1523701205467926528:profile|AgitatedDove14>
Yes, it was indeed in our code! after looking in depth, the loading of .cu and .cpp files was the root of the issue, slowing down the batch inference. Thanks a lot for your support!!
Here this new entry in the log is 2 min after env completed =>
1702378941039 box132 DEBUG 2023-12-12 11:02:16,112 - clearml.model - INFO - Selected model id: 9be79667ca644d7dbdf26732345f5415
This seems to be something in your code, just add print("starting") in your entry python file, Before any imports (because they might actually do something)
Because form the agent's perspective after printing Starting Task Execution:
it literally calls the python script, nothing else ...
Hi @<1523701205467926528:profile|AgitatedDove14> !
Thanks againg for following up this thread.
Perhaps It is not clear to read the delay in the log, but is just after "Starting Task Execution:"
Environment setup completed successfully
Starting Task Execution:
Here this new entry in the log is 2 min after env completed =>1702378941039 box132 DEBUG 2023-12-12 11:02:16,112 - clearml.model - INFO - Selected model id: 9be79667ca644d7dbdf26732345f5415
So, the environment is created OK, but then it takes a consistent 2 min before actually starting the task.
No additional info is printed in the log during this wait time.
No worries for missing clearml-agent & git, we managed to avoid the venv creation as the docker container has the environment already set up. So I think our issue is not related to the venv set up itself, but the start after this.
We are looking around the "clearml_agent execute" command, as it is called right away after env set up...?
However, there is still a delay of approximately 2 minutes between the completion of setup,
Where is that delay in the log?
(btw: it seems your container is missing clearml-agent & git, installing those might add some time)
Hello @<1523701205467926528:profile|AgitatedDove14> , thank you for addressing my concern. It seems that the aspect of avoiding the venv is functioning correctly, and everything within the container is properly configured to initiate. However, there is still a delay of approximately 2 minutes between the completion of setup, thus the appearance of the console log indicating "Starting Task Execution" and the actual commencement of the inference logic. During this period, no additional logs are generated.
Hi @<1529633468214939648:profile|CostlyElephant1>
what seems to be the issue? I could not locate anything in the log
"Environment setup completed successfully
Starting Task Execution:"
Do you mean it takes a long time to setup the environment inside the container?
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL and CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL,
It seems to be working, as you can see no virtual environment is created, the only thing that is installed is the cleartml-agent that is missing form the base container
Hi @<1523701205467926528:profile|AgitatedDove14> would you be so kind to take a look at this issue?
we still have 2 minutes between the log of
"
Starting Task Execution:
"
and actually starting our inference logic. We have no extra info from the log to check to improve this slow task start time.
thanks a lot for any feedback!
Hi @<1523701070390366208:profile|CostlyOstrich36>
Did you have the time to check the full log I sent you?
Most appreciated!
Also, as can be seen in docker args, I tried using CLEARML_AGENT_SKIP_PIP_VENV_INSTALL and CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL, to avoid installing packages as the container conatins everything needed to run the task, but not sure that it had any effect.
Here it is @<1523701070390366208:profile|CostlyOstrich36>
Thanks for your feedback
Hi @<1529633468214939648:profile|CostlyElephant1> , it looks like thats the environment setup. Can you share the full log?