i really dont see how this provides any additional context that the timestamps + crops dont but okay.
yeah, still noticing that it can be multiple minutes before something starts...
like... what is happening in this time (besides a git clone), now that I set  both
export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true
export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)
update: it's now been six mins and the task still isn't done. this should have run through in like a minute total end-to-end
minute of silence between first two msgs and then two more mins until a flood of logs. Basically 3 mins total before this task (which does almost nothing - just using it for testing) starts.


in my case using self-hosted and agent inside a docker container:
47:45 : taks foo pulled
[ git clone, pip install, check that all requirements satisfied, and nothing is downloaded]
48:16 : start training
I think a proper screenshot of the full log with some information redacted is the way to go. Otherwise we are just guessing in the dark
im not running in docker mode though - im running a clearml worker in a docker container (and then multiplying the container)
okay that's a similar setup to mine... that's interesting.
much more in line with my expectation.
i was having a ton of git clone issues - disabled caching entirely... wonder if that may help too.
tysm for your help! will report back soon.
from the logs, it feels like after git clone, it spend minutes without outputting anything.  @<1523701205467926528:profile|AgitatedDove14>  Do you know what is the agent suppose to do after git clone ?
I guess a check that all packages is installed ? But then with  CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1,  what is the agent doing ??
ah I see. thank you very much!
trying  export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)
but I still see  Environment setup completed successfully
(it is printed after  Running task id  )
it still takes a full 3 minutes between task pulled by worker until  Running task id
is this normal? What is happening in these few minutes (besides a git pull / switch)?
from task pick-up to "git clone" is now ~30s, much better.
This is "spent" calling apt update && update install && pip install clearml-agent
if you have those preinstalled it should be quick
though as far as I understand, the recommendation is still to not run workers-in-docker like this:
if you do not want it to install anything and just use existing venv (leaving the venv as is) and if something is missing then so be it, then yes sure that the way to go
yeah... still seeing variances from 1m to 10m for the same task. been testing parallel execution for hours.
Please refer to here  None
The doc need to be a bit clearer: one require a path and not just true/false
starting to . thanks for your explanation .
would those containers best be started from something in services mode? or is it possible to get no-overhead with my approach of worker-inside-docker?
i designed my tasks as different functions, based mostly on what metrics to report and artifacts that are best cached (and how to best leverage comparisons of tasks) . they do require cpu, but not a ton.
I'm now experimenting with lumping a lot of stuff into one big task and seeing how this goes instead . i have to be more selective in the reporting of metrics and plots though .
normally when new package need to be install, it shows up in the Console tab
i would love some advice on that though - should I be using services mode + docker and some max # of instances to be spinning up multiple tasks instead?
my thinking was to avoid some of the docker overhead. but i did try this approach previously and found that the container limit wasn't exactly respected.
but pretty reliably some proportion of tasks still just take a much longer time. 1m - 10m is a variance i'd really like to understand.
fwiw - i'm starting to wonder if there's a difference between me "resetting the task" vs cloning it.
oh yes.  Using env  until the next message is 2 minutes.
oooh thank you, i was hoping for some sort of debugging tips like that. will do.
from a speed-of-clearing-a-queue perspective, is a  services-mode  queue better or worse than having many workers "always up"?
thank you!
i'll take that design into consideration.
re: CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL in "docker venv mode" im still not quite sure I understand correctly - since the agent is running in a container, as far as it is concerned it may as well be on bare-metal.
is it just that there's no way for that worker to avoid venv? (i.e. the only way to bypass venv is to use docker-mode?)
of what task? i'm running lots of them and benchmarking execution times. would you like to see a best case or worst case scenario? (ive kept some experiments for each).
and yeah, in those docs you just linked, "boolean" vars like  CLEARML_AGENT_GIT_CLONE_VERBOSE  explicitly say  true  so I ended up trying that pattern. but originally i did try 1. let me go back to that now. thank you.
overall I've seen some improvements in execution time using the suggestions in this thread (tysm!) - the preinstalled libs seem to be helping, though some things are still just unbearably slow (one of my larger pipelines took > 1 h to generate a DAG before even starting...).
are you on clearml agent 1.8.0?
(im noticing sometimes im just missing logs such as "Running task id.." entirely)