Hi! were you able to reproduce the issue CostlyOstrich36 ?
so what we should do is turn pip freeze on in the clearml.conf
file?
Ok, tried the following four things:
(fail = sklearn not listed in installed packages)
no _
init
_.py
file in the module_a folder, not a git repo: fail no _
init
_.py
file in module_a folder, git repo: fail with _
init
_.py
file in module_a folder, not git repo: fail with _
init
_.py
file in module_a folder, with git repo: OK!
MuddySquid7 , we're having a look and testing it. Thanks!
I see, I can confirm that these packages (except for google_cloud_storage) are imported directly in the main script
Thank you, I would love to make sure we fix it
great, let me know if I can help you in any way. Thanks!
Ok, I think figured it out. We started with a main script that imported sklearn and then we moved that function outside the main script, and instead imported that function.
So when we cloned the first time we had sklearn in the Installed Packages, and therefore our agent was able to run. The (now) cached clearml-venv had sklearn installed, and when it run the second experiment without the sklearn import in the main script and therefore without it in the Installed Packages it didn't matter, because the package was already installed..
which part of the code?
the main script?!
but is not part of the package
is the repo it self a package ?
Previously we had similar issues when we switched images used in agent. Might want to check on that.
I'm assuming these are the Only packages that are imported directly (i.e. pandas requires other packages but the code imports pandas so this is what listed).
The way ClearML detect packages, it first tries to understand if this is a "standalone" scrip, if it does, than only imports in the main script are logged. Then if it "thinks" this is not a standalone script, then it will analyze the entire repository.
make sense ?
ok, so ClearML doesn't add all the imported packages needed to run the task to the Installed Packages, only the ones in the main script?
MuddySquid7 , Yes! Reproduced like a charm. We're looking into it ๐
Hmm, I think "it" misses the fact callbacks
are not a package.
Any chance you can post the code here? (or DM me)
Ok,ย I think figured it out.
Nice!
ClearML doesn't add all the imported packages needed to run the task to the Installed Packages
It does (but not derivative packages, that are used by the required packages, the derivative packages will be added when the agent is running it, because it creates a new clean venv and then it add the required packages, then it updates back with everything in pip freeze, because it now represents All the packages the Task needs)
Two questions:
Is the code running from a git repository Is the "second function" (the one actually importing sklearn) in a different file? if so how do you import it
So what changed?
We changed other bits of code, but not that one..
But maybe we are focusing on the wrong thing, the question now is why is ClearML only detecting these packages (running a different experiment than Diego)
Pillow == 8.0.1
clearml == 0.17.5
google_cloud_storage == 1.40.0
joblib == 0.17.0
numpy == 1.19.5
pandas == 1.3.1
seaborn == 0.11.0
tensorflow_gpu == 2.3.1
tqdm == 4.54.1
I've included a requirements.txt file
Did you put anything insideย
init.py
?
nope
right, callbacks.py
is a file inside the repo, but is not part of the package
MuddySquid7 , I couldn't reproduce case 4.
In all cases it didn't detect sklearn.
Did you put anything inside _init_.py
?
Can you please zip up the folder from scenario 4. and post it here?
yes, the code is inside a git repository In the main script: from callbacks import function_plot_conf_matrix
and inside callbacks.py
of course at the beginning we have from sklearn.metrics import confusion_matrix
or something like that