are you on clearml agent 1.8.0?
(im noticing sometimes im just missing logs such as "Running task id.." entirely)
what if the preexisting venv is just the system python? my base image is python:3.10.10 and i just pip install all requirements in that image. Does that not avoid venv still?
it will basically create a new venv inside the container forking the existing preinistalled stuff (i.e. the new venv already has everything the python system has preinstalled)
then it will call "pip install" on all the "installed packages of the Task.
Which should just check everything is there and install nothing
If you set " CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1" it will do checks and just use the existing system python environment as is.
, I can get 50 tasks to run in the same time it takes to run a single one? i cant imagine the apiserver being a noticeable bottleneck.
50 containers on a single machine would be fine if you have enough RAM/CPU, and yes they would run concurrently.
regrading the time itself, again the spinup time of a Task should be negligible.
Pipeline tasks are not meant to be "threads" they are meant as different functions you want to run on different machines,
This means that if your pipeline is just a set of simple functions that require no cpu/gpu or IO, I'm not sure pipeline steps is the right way to go
Does that make sense?
thank you!
i'll take that design into consideration.
re: CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL in "docker venv mode" im still not quite sure I understand correctly - since the agent is running in a container, as far as it is concerned it may as well be on bare-metal.
is it just that there's no way for that worker to avoid venv? (i.e. the only way to bypass venv is to use docker-mode?)
what if the preexisting venv is just the system python ? my base image is python:3.10.10 and i just pip install all requirements in that image . Does that not avoid venv still?
it's good to know that in theory there's a path forward with almost zero overhead . that's what I want .
is it reasonable to expect that with sufficient workers, I can get 50 tasks to run in the same time it takes to run a single one? i cant imagine the apiserver being a noticeable bottleneck .
@<1523701205467926528:profile|AgitatedDove14> About why we stay on 1.12.2 : None
ha! yup. that was it exactly. I posted about it too None lol
oh yes.  Using env  until the next message is 2 minutes.
would those containers best be started from something in services mode?
Yes as long as the machine has enough cpu/ram
Notice that the services mode will start a second parallel Task after the first one is done setting up the env, if running with  CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL,  with containers that have git/python/clearml-agent preinstalled it should be minimal.
or is it possible to get no-overhead with my approach of worker-inside-docker?
No do not do that, see above explanation on why CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL does not work in docker venv mode
i designed my tasks as different functions, based mostly on what metrics to report and artifacts that are best cached (and how to best leverage comparisons of tasks). they do require cpu, but not a ton.
just report a single Task as multiple "titles" then each title is it's own step, then inside the "title" they have different seriese
is there a way for me to toggle CLEARML's log level?
Try to set the python master logger base logging level
oooh thank you, i was hoping for some sort of debugging tips like that. will do.
from a speed-of-clearing-a-queue perspective, is a  services-mode  queue better or worse than having many workers "always up"?
- try with the latest RC  
1.8.1rc2 
, it feels like after git clone, it spend minutes without outputting anything
yeah that is odd , can you run the agent with --debug (add before the  daemon  command) , and then at the end of the command add  --foreground
Now launch the same task on that queue, you will have a verbose log in the console.
Let us know what you see
i just ran a pipeline that took about 2h (more than half this time was just the DAG), with about a hundred tasks. i'm taking a look at them now to see what the logs show for runtimes.
hard to see with your croppout here an there ...
Hi Guys, just curious here, what's was the final issue?
Also out of curiosity, what does that mean? "1.12.2 because some bug that make fastai lag 2x" ?
I know that git clone and pip verify all installed is normal. But for some reason in Michael screenshot, I don't see those steps ...
So "Using env ..." take minutes without any output ?
you should be able to see int the Console tab that show what is happening
"regular" worker will run one job at a time, services worker will spin multiple tasks at the same time But their setup (i.e. before running the actual task) is one at a time..
of what task? i'm running lots of them and benchmarking
If you are skipping every installation it should be the same
because if you set  CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1   it will not install Anything at all
This is why it's odd to me...
wdyt?
i just need to understand what I should be expecting. I thought from putting it into queue in UI to "running my code remotely" (esp with packages preloaded) should be fairly fast turnaround - certainly not three minutes... i'll have to change my whole pipeline design if this is the case)
apologies - just trying to keep sensitive data out of screenshot
1.12.2 because some bug that make fastai lag 2x
1.8.1rc2 because it fix an annoying git clone bug
We need to focus first on Why is it taking minutes to reach Using env.
In our case, we have a container that have all packages installed straight in the system, no venv in the container. Thus we don't use  CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
But then when a task is pulled, I can see all the steps like git clone, a bunch of  Requirement already satisfied  .... There may be some odd package that need to be installed because one of our DS is experimenting ... But all that we can see what is happening.
In  @<1689446563463565312:profile|SmallTurkey79>  case, are you saying the log don't show anything at all ? After it pull the task: 5 minutes pass and no explanation of what those 5min been used for ?
I can see all the steps like git clone,
git clone has nothing to do with "env setup" this is brining the code, you cannot skip that one, that said, this is why the git itself is cached on the host machine, so it is fast
... There may be some odd package that need to be installed because one of our DS is experimenting ... But all that we can see what is happening.
even if everything is preinstalled, it Verifies the packages match, this might take a long time. It's just pip being pip (if you want the extreme try to do the same with conda, that one is even slower)
the output of that verification stage is no new packages are installed (otherwise good thing we checked  🙂  )
bottom line, if you want to skip the pip verification/installation pass  CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
btw: i'm checking regrading the GH issue
the timestamps were all that mattered in those.
sometimes I get "lucky" and see something more like what I expect... total experiment time < 1 min (and I have evidence of this happening. logs start-to-finish in sub-minute). But then other times the same task will take 5-10 minutes.
same worker, same queue, just one worker serving it... I am so utterly perplexed by the variation in how long things take. my clearml API server is running on a beefy 32 core machine and not much else is happening right now...
oh it's there, before running task.
from task pick-up to "git clone" is now ~30s, much better.
though as far as I understand, the recommendation is still to not run workers-in-docker like this:
export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
  export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)
(and fwiw I have this in my  entrypoint.sh )
cat <<EOF > ~/clearml.conf
agent {
    vcs_cache {
        enabled: true
    }
    package_manager: {
        type: pip,
        system_site_packages: true,
    }
}
EOF
		there is almost zero overhead if your docker container alreadyt has everything (including the agent) preinstalled and you set it with  CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
it then should basically just run the code.
im not running in docker mode though
hmmm that might be the first issue. it cannot skip venv creation, it can however use a pre-existing venv (but it will change it every time it installs a missing package)
so setting  CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1  in non docker mode has no affect