how to make sure it will traverse only current package?
Just making sure there is no bug in the process, if you call Task.init in your entire repo (serve/train) you end up with "installed packages" section that contains all the required pacakges for both use cases ?
I have separate packages for serving and training in a single repo. I don’t want serving requirements to be installed.
Hmm, it cannot "know" which is which, because it doesn't really trace all the import logs (this will take too long and is quite complicated), it does static analysts on everything...
You have two options
Specify the requirements.txt for each use case, just before calling Task.init , this means you maintain two requirements.txt for each use caseTask.force_requirements_env_freeze(requirements_file="requirements.txt")
2. Specify "ignore packages" for each use case, just removing the package you know before hand, are too "heavy" or not needed in the specific user case:Task.ignore_requirements("some_package_name")
wdyt?
example here: https://github.com/martjushev/clearml_requirements_demo
I am importing a module which is in the same folder as the main one (i.e. in the same package)
“To have the Full pip freeze
as “installed packages” - that’s exactly what I’m trying to prevent. Locally my virtualenv has all the dependencies for all the clearml tasks, which is fine because I don’t need to download and install them every time I launch a task. But remotely I want to keep the bare minimum needed for the concrete task. Which clearml successfully does, as long as I don’t import any local modules.
Yes that makes sense, if the overhead of the additional packages is not huge, I do not think it is worth the maintenance 🙂
BTW clearml-agent has full venv caching that you can turn on, so when running remotely you are not "paying" for the additional packages being installed:
Un-comment this line 🙂
https://github.com/allegroai/clearml-agent/blob/51eb0a713cc78bd35ca15ed9440ddc92ffe7f37c/docs/clearml.conf#L116
if you call Task.init in your entire repo (serve/train) you end up with "installed packages" section that contains all the required pacakges for both use cases ?
yes, and I thought that it is looking at what libraries are installed in virtualenv, but you explained that it rather doing a static analysis over whole repo.
You have two options
I think both can work but too much of a hassle. I think I’ll skip extracting the common code and keep it duplicated for now
do you want a fully reproducible example or just 2 scripts to illustrate?
but we run everything in docker containers. Will it still help?
clearml.utilities.pigar.main.GenerateReqs.extract_reqs
Hi FiercePenguin76
By default clearml
will list only the packages you import, and not derivative packages.
This means that if you import package X
and it imports package Y
, only package X
will be listed.
The way it should work is by statically analyzing the entire repository, but if you import a local package from a different local folder, and that folder is Not in the same repo, it will not get listed (obviously if you install the external local package, it will be listed)
To have the Full pip freeze
as "installed packages" you can do:Task.force_requirements_env_freeze() Task.init(...)
If you want you can also supply the local requirements.txt with:Task.force_requirements_env_freeze(requirements_file="requirements.txt") Task.init(...)
Make sense ?
I don’t see these lines when requirement deducing from imports happen.
One workaround that I see is to export commonly used code not to a local module, but rather to a separate in-house library.
“supply the local requirements.txt” this means I have to create a separate requirements.txt for each of my 10+ modules with different clearml tasks
ok, so if it goes over whole repository, then my question transforms into: how to make sure it will traverse only current package? I have separate packages for serving and training in a single repo. I don’t want serving requirements to be installed.
FiercePenguin76 the git repo should detect only clearml
as required python package
Basically the steps are:
decide if the initial python entry script is a standlone script (i.e. no local imports) in the git repo (in your example "task_with_deps.py") If this is a "standlone script" only look for imports inside the calling python script, and list those packages under "installed packages" If this is Note a standalone script, go over All the python files inside the repository, look for "imports" and add the packages into the "installed packages" sectionI'm not sure I follow where your example fails, based on the code in the repo, the only required package that should be listed is "clearml"
What am I missing ?
first analyze the entry point script, if it does not contain other to local files
FiercePenguin76 can you provide some simple concrete example?
as I understand this: even though force=false, my script is importing another module from same project and thus triggering analyze_entire_repo
but we run everything in docker containers. Will it still help?
As long as you are running with clearml-agent(in docker mode), all the cache folders (this one included) are mounted on the host machine for persistency
if you import a local package from a different local folder, and that folder is Not in the same repo
I found this in the conf:# Default auto generated requirements optimize for smaller requirements # If True, analyze the entire repository regardless of the entry point. # If False, first analyze the entry point script, if it does not contain other to local files, # do not analyze the entire repository. force_analyze_entire_repo: false