Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Have A Bunch Of Python Modules With Clearml Tasks. They Are Using 3Rd-Party Libraries But No Module Uses Code From Another Module. When I Run Such A Task Remotely - Then Clearml Deduces The Dependencies From Imports, Which Works Fine. Now I Decided To T

I have a bunch of python modules with clearml tasks. They are using 3rd-party libraries but no module uses code from another module. When I run such a task remotely - then clearml deduces the dependencies from imports, which works fine.
Now I decided to take out some common code from those modules into a common.py. Now, when I try to run a task - instead of deducing, clearml takes all of the libraries from current virtual environment (which has a lot more than a single task might need). Is my only option to use https://clear.ml/docs/latest/docs/references/sdk/task/#taskadd_requirements ?

  
  
Posted 2 years ago
Votes Newest

Answers 30


but we run everything in docker containers. Will it still help?

As long as you are running with clearml-agent(in docker mode), all the cache folders (this one included) are mounted on the host machine for persistency

  
  
Posted 2 years ago

sorry, not very clear.

  
  
Posted 2 years ago

That would be great 🙏

  
  
Posted 2 years ago

I found this in the conf:
# Default auto generated requirements optimize for smaller requirements # If True, analyze the entire repository regardless of the entry point. # If False, first analyze the entry point script, if it does not contain other to local files, # do not analyze the entire repository. force_analyze_entire_repo: false

  
  
Posted 2 years ago

how to make sure it will traverse only current package?

Just making sure there is no bug in the process, if you call Task.init in your entire repo (serve/train) you end up with "installed packages" section that contains all the required pacakges for both use cases ?

I have separate packages for serving and training in a single repo. I don’t want serving requirements to be installed.

Hmm, it cannot "know" which is which, because it doesn't really trace all the import logs (this will take too long and is quite complicated), it does static analysts on everything...
You have two options
Specify the requirements.txt for each use case, just before calling Task.init , this means you maintain two requirements.txt for each use caseTask.force_requirements_env_freeze(requirements_file="requirements.txt")2. Specify "ignore packages" for each use case, just removing the package you know before hand, are too "heavy" or not needed in the specific user case:
Task.ignore_requirements("some_package_name")wdyt?

  
  
Posted 2 years ago

Hi FiercePenguin76
By default clearml will list only the packages you import, and not derivative packages.
This means that if you import package X and it imports package Y , only package X will be listed.
The way it should work is by statically analyzing the entire repository, but if you import a local package from a different local folder, and that folder is Not in the same repo, it will not get listed (obviously if you install the external local package, it will be listed)
To have the Full pip freeze as "installed packages" you can do:
Task.force_requirements_env_freeze() Task.init(...)If you want you can also supply the local requirements.txt with:
Task.force_requirements_env_freeze(requirements_file="requirements.txt") Task.init(...)Make sense ?

  
  
Posted 2 years ago

if you call Task.init in your entire repo (serve/train) you end up with "installed packages" section that contains all the required pacakges for both use cases ?
yes, and I thought that it is looking at what libraries are installed in virtualenv, but you explained that it rather doing a static analysis over whole repo.

  
  
Posted 2 years ago

first analyze the entry point script, if it does not contain other to local files

  
  
Posted 2 years ago

FiercePenguin76 can you provide some simple concrete example?

  
  
Posted 2 years ago

AgitatedDove14 WDYT?

  
  
Posted 2 years ago

I am importing a module which is in the same folder as the main one (i.e. in the same package)

  
  
Posted 2 years ago

is it possible to override this?

  
  
Posted 2 years ago

FiercePenguin76 the git repo should detect only clearml as required python package
Basically the steps are:
decide if the initial python entry script is a standlone script (i.e. no local imports) in the git repo (in your example "task_with_deps.py") If this is a "standlone script" only look for imports inside the calling python script, and list those packages under "installed packages" If this is Note a standalone script, go over All the python files inside the repository, look for "imports" and add the packages into the "installed packages" sectionI'm not sure I follow where your example fails, based on the code in the repo, the only required package that should be listed is "clearml"
What am I missing ?

  
  
Posted 2 years ago

“supply the local requirements.txt” this means I have to create a separate requirements.txt for each of my 10+ modules with different clearml tasks

  
  
Posted 2 years ago

do you want a fully reproducible example or just 2 scripts to illustrate?

  
  
Posted 2 years ago

as I understand this: even though force=false, my script is importing another module from same project and thus triggering analyze_entire_repo

  
  
Posted 2 years ago

exactly what I’m talking about

  
  
Posted 2 years ago

“To have the Full pip freeze as “installed packages” - that’s exactly what I’m trying to prevent. Locally my virtualenv has all the dependencies for all the clearml tasks, which is fine because I don’t need to download and install them every time I launch a task. But remotely I want to keep the bare minimum needed for the concrete task. Which clearml successfully does, as long as I don’t import any local modules.

  
  
Posted 2 years ago

I don’t see these lines when requirement deducing from imports happen.

  
  
Posted 2 years ago

clearml.utilities.pigar.main.GenerateReqs.extract_reqs

  
  
Posted 2 years ago

You have two options
I think both can work but too much of a hassle. I think I’ll skip extracting the common code and keep it duplicated for now

  
  
Posted 2 years ago

this is where the “magic” happens

  
  
Posted 2 years ago

ok, so if it goes over whole repository, then my question transforms into: how to make sure it will traverse only current package? I have separate packages for serving and training in a single repo. I don’t want serving requirements to be installed.

  
  
Posted 2 years ago

if you import a local package from a different local folder, and that folder is Not in the same repo

  
  
Posted 2 years ago

One workaround that I see is to export commonly used code not to a local module, but rather to a separate in-house library.

  
  
Posted 2 years ago

Yes that makes sense, if the overhead of the additional packages is not huge, I do not think it is worth the maintenance 🙂
BTW clearml-agent has full venv caching that you can turn on, so when running remotely you are not "paying" for the additional packages being installed:
Un-comment this line 🙂
https://github.com/allegroai/clearml-agent/blob/51eb0a713cc78bd35ca15ed9440ddc92ffe7f37c/docs/clearml.conf#L116

  
  
Posted 2 years ago

but we run everything in docker containers. Will it still help?

  
  
Posted 2 years ago

Hmmm, interesting

  
  
Posted 2 years ago
958 Views
30 Answers
2 years ago
one year ago
Tags