Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Have Set

I have set

export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true
export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=true

in my entrypoint.sh (which runs clearml-agent daemon --queue $QUEUES --create-queue --cpu-only --foreground )

but it appears that tasks still take a long time to set up environments. I expected the whole process to be skipped and for the preinstalled python deps in the docker image (which is running this entrypoint script) to be used.

From task pickup to task "run python file" can be several minutes... which is greater than some of the tasks take themselves.

  
  
Posted 8 months ago
Votes Newest

Answers 54


i was having a ton of git clone issues - disabled caching entirely... wonder if that may help too.

tysm for your help! will report back soon.

  
  
Posted 8 months ago

minute of silence between first two msgs and then two more mins until a flood of logs. Basically 3 mins total before this task (which does almost nothing - just using it for testing) starts.
image
image
image

  
  
Posted 8 months ago

fwiw - i'm starting to wonder if there's a difference between me "resetting the task" vs cloning it.

  
  
Posted 8 months ago

you should be able to see int the Console tab that show what is happening

  
  
Posted 8 months ago

I'm just working on speeding up the time from "queue experiment" to "my code actually runs remotely" - as of yesterday things would sit for many minutes at a time. trying to see if venv is the culprit .

  
  
Posted 8 months ago

  • try with the latest RC 1.8.1rc2

, it feels like after git clone, it spend minutes without outputting anything

yeah that is odd , can you run the agent with --debug (add before the daemon command) , and then at the end of the command add --foreground
Now launch the same task on that queue, you will have a verbose log in the console.
Let us know what you see

  
  
Posted 8 months ago

what if the preexisting venv is just the system python? my base image is python:3.10.10 and i just pip install all requirements in that image. Does that not avoid venv still?

it will basically create a new venv inside the container forking the existing preinistalled stuff (i.e. the new venv already has everything the python system has preinstalled)
then it will call "pip install" on all the "installed packages of the Task.
Which should just check everything is there and install nothing

If you set " CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1" it will do checks and just use the existing system python environment as is.

, I can get 50 tasks to run in the same time it takes to run a single one? i cant imagine the apiserver being a noticeable bottleneck.

50 containers on a single machine would be fine if you have enough RAM/CPU, and yes they would run concurrently.
regrading the time itself, again the spinup time of a Task should be negligible.
Pipeline tasks are not meant to be "threads" they are meant as different functions you want to run on different machines,
This means that if your pipeline is just a set of simple functions that require no cpu/gpu or IO, I'm not sure pipeline steps is the right way to go

Does that make sense?

  
  
Posted 8 months ago

is there a way for me to toggle CLEARML's log level? I'm doing some manual task-debugging in ipython and think it would be helpful to see network requests and timeouts if they're occurring.

  
  
Posted 8 months ago

from task pick-up to "git clone" is now ~30s, much better.

This is "spent" calling apt update && update install && pip install clearml-agent
if you have those preinstalled it should be quick

though as far as I understand, the recommendation is still to not run workers-in-docker like this:

if you do not want it to install anything and just use existing venv (leaving the venv as is) and if something is missing then so be it, then yes sure that the way to go

  
  
Posted 8 months ago

I can see all the steps like git clone,

git clone has nothing to do with "env setup" this is brining the code, you cannot skip that one, that said, this is why the git itself is cached on the host machine, so it is fast

... There may be some odd package that need to be installed because one of our DS is experimenting ... But all that we can see what is happening.

even if everything is preinstalled, it Verifies the packages match, this might take a long time. It's just pip being pip (if you want the extreme try to do the same with conda, that one is even slower)
the output of that verification stage is no new packages are installed (otherwise good thing we checked 🙂 )
bottom line, if you want to skip the pip verification/installation pass CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1

btw: i'm checking regrading the GH issue

  
  
Posted 8 months ago

from the logs, it feels like after git clone, it spend minutes without outputting anything. @<1523701205467926528:profile|AgitatedDove14> Do you know what is the agent suppose to do after git clone ?
I guess a check that all packages is installed ? But then with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1, what is the agent doing ??

  
  
Posted 8 months ago

Hi Guys, just curious here, what's was the final issue?
Also out of curiosity, what does that mean? "1.12.2 because some bug that make fastai lag 2x" ?

  
  
Posted 8 months ago

I think a proper screenshot of the full log with some information redacted is the way to go. Otherwise we are just guessing in the dark

  
  
Posted 8 months ago

oh it's there, before running task.

from task pick-up to "git clone" is now ~30s, much better.

though as far as I understand, the recommendation is still to not run workers-in-docker like this:

export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
  export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)

(and fwiw I have this in my entrypoint.sh )

cat <<EOF > ~/clearml.conf
agent {
    vcs_cache {
        enabled: true
    }

    package_manager: {
        type: pip,
        system_site_packages: true,
    }

}
EOF
  
  
Posted 8 months ago

normally when new package need to be install, it shows up in the Console tab

  
  
Posted 8 months ago

ha! yup. that was it exactly. I posted about it too None lol

  
  
Posted 8 months ago

1.12.2 because some bug that make fastai lag 2x
1.8.1rc2 because it fix an annoying git clone bug

  
  
Posted 8 months ago

We need to focus first on Why is it taking minutes to reach Using env.
In our case, we have a container that have all packages installed straight in the system, no venv in the container. Thus we don't use CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
But then when a task is pulled, I can see all the steps like git clone, a bunch of Requirement already satisfied .... There may be some odd package that need to be installed because one of our DS is experimenting ... But all that we can see what is happening.
In @<1689446563463565312:profile|SmallTurkey79> case, are you saying the log don't show anything at all ? After it pull the task: 5 minutes pass and no explanation of what those 5min been used for ?

  
  
Posted 8 months ago

this bug: None

  
  
Posted 8 months ago

BTW: you can also just add -e " CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1" to the docker args (under the Execution tab) to override the setting of the docker.
you can also add " export; " to the docker startup bash script section (do not add "#/bin/bash" , just the actual script) to get a list of all the environment variables inside the docker, just in case

  
  
Posted 8 months ago

yeah... still seeing variances from 1m to 10m for the same task. been testing parallel execution for hours.

  
  
Posted 8 months ago

I know that git clone and pip verify all installed is normal. But for some reason in Michael screenshot, I don't see those steps ...

  
  
Posted 8 months ago

"regular" worker will run one job at a time, services worker will spin multiple tasks at the same time But their setup (i.e. before running the actual task) is one at a time..

  
  
Posted 8 months ago

@<1523701205467926528:profile|AgitatedDove14> About why we stay on 1.12.2 : None

  
  
Posted 8 months ago

i really dont see how this provides any additional context that the timestamps + crops dont but okay.

  
  
Posted 8 months ago

the timestamps were all that mattered in those.

  
  
Posted 8 months ago

i just need to understand what I should be expecting. I thought from putting it into queue in UI to "running my code remotely" (esp with packages preloaded) should be fairly fast turnaround - certainly not three minutes... i'll have to change my whole pipeline design if this is the case)

  
  
Posted 8 months ago

there is almost zero overhead if your docker container alreadyt has everything (including the agent) preinstalled and you set it with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
it then should basically just run the code.

  
  
Posted 8 months ago

oh yes. Using env until the next message is 2 minutes.

  
  
Posted 8 months ago

clearml==1.12.2
clearml_agent v1.8.1rc2

  
  
Posted 8 months ago
26K Views
54 Answers
8 months ago
8 months ago
Tags