i’ll clone and enqueue, but i’m guessing that’s the issue
Sounds great! let me know what you find out 🙂
Could it be pandas was not installed on the local machine ?
and it’s in the “installed packages” from the child task:absl-py==0.14.0 aiohttp==3.7.4.post0 async-timeout==3.0.1 attrs==21.2.0 cachetools==4.2.2 certifi==2021.5.30 chardet==4.0.0 charset-normalizer==2.0.6 clearml==1.1.1 cycler==0.10.0 Cython==0.29.24 fsspec==2021.9.0 furl==2.1.2 future==0.18.2 google-auth==1.35.0 google-auth-oauthlib==0.4.6 grpcio==1.40.0 idna==3.2 joblib==1.0.1 jsonschema==3.2.0 kiwisolver==1.3.2 Markdown==3.3.4 matplotlib==3.4.3 multidict==5.1.0 numpy==1.21.2 oauthlib==3.1.1 orderedmultidict==1.0.1 packaging==21.0 pathlib2==2.3.6 Pillow==8.3.2 protobuf==3.18.0 psutil==5.8.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pyDeprecate==0.3.1 PyJWT==2.1.0 pyparsing==2.4.7 pyrsistent==0.18.0 python-dateutil==2.8.2 pytorch-lightning==1.4.8 PyYAML==5.4.1 requests==2.26.0 requests-oauthlib==1.3.0 rsa==4.7.2 scikit-learn==0.24.2 scipy==1.7.1 six==1.16.0 tensorboard==2.6.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.0 threadpoolctl==2.2.0 torch==1.9.1 torchmetrics==0.5.1 tqdm==4.62.3 typing-extensions==3.10.0.2 urllib3==1.26.7 Werkzeug==2.0.1 yarl==1.6.3
okay, i have a few things on my todo list, they will take a while. we will task.init
in the entry point instead of how it’s done now, and we will re-try python -m
. if it doesn’t work, we will file an issue. if it does work, yay!
either way, thanks much for your help today, i really appreciate it.
in the main script, these are the first imports:import argparse import time import json import pytorch_lightning as pl from pytorch_lightning.accelerators import accelerator
then after that we import stuff from the repo, and the listed packages are imported in those files
I think the main issue is running with python -m module.name --args
Which is a bit different, when trying to "understand" what is the actual repository.
Can you try to run it from the repository folder (same command, just to see if it will have any effect on the detected packages)
$ conda list | grep matplotlib matplotlib 3.4.3 py39hf3d152e_1 conda-forge matplotlib-base 3.4.3 py39h2fa2bec_1 conda-forge
BTW: how is it missing listing torch
? Do you have "import torch" in the code ?
that must have been it. here’s the installed packages when not using -m
:
` # Python 3.9.7 | packaged by conda-forge | (default, Sep 23 2021, 07:28:37) [GCC 9.4.0]
Local modules found - skipping:
modulename == ../pathto/modulename/init.py
PyYAML == 5.4.1
Shapely == 1.7.1
clearml == 1.1.1
click == 7.1.2
matplotlib == 3.4.3
numpy == 1.21.2
pandas == 1.3.3
python_dateutil == 2.8.2
pytorch_lightning == 1.4.8
pytz == 2021.1
rasterio == 1.2.8
scikit_image == 0.18.3
scikit_learn == 0.24.2
scipy == 1.7.1
tensorboard == 2.6.0
torch == 1.9.1
torchvision == 0.2.2
tqdm == 4.62.3 `
BTW: could it be the Task.init is Not called on the "module.name" entry point, but somewhere internally ?
not sure if this is considered a bug or not! but I’d happily make an issue on github if needed.
I think we should, at least for the sake of transparency and visibility 🙂
thanks again for all your help.
My pleasure 🙂
yeah, it’s in one of the imports from the repo
$ conda list | grep pandas geopandas 0.9.0 pyhd8ed1ab_1 conda-forge geopandas-base 0.9.0 pyhd8ed1ab_1 conda-forge pandas 1.3.3 py39hde0f152_0 conda-forge
actually no
hmm, are those packages correct ?
getting different issues (torchvision vs. cuda compatibility, will work on that), but i’m betting that was the issue
and it’s in the “installed packages” from the child task:
This is because the agent always updates back the full venv setup, so you will be able to always reproduce the entire thing (as opposed to dev time, where it lists only the directly imported packages)
that must have been it. here’s the installed packages when not using
-m
:
Hmm yes, can you open a GitHub issue on that? (this seems like a bug)
(also, the training code, which uses pandas, worked)
$ pip freeze | grep pandas geopandas @ file:///home/conda/feedstock_root/build_artifacts/geopandas_1623249625470/work pandas==1.3.3
we do use all those packages, and the version numbers are correct
but, the call used to start the script was python -m module.name --args
okay, so here’s what i found out—
calling the training entry point directly (eg /path/to/train.py
), and not instantiating the clearml Task in train.py
(eg calling a method in a different module where the task is instantiated) does work calling the entrypoint with python -m
, but instantiating the clearml Task within train.py
also works
so the only thing that doesn’t work is calling the entrypoint with python -m
and calling a method from a different module that instantiates the task.
not sure if this is considered a bug or not! but I’d happily make an issue on github if needed.
thanks again for all your help.
(torchvision vs. cuda compatibility, will work on that),
The agent will pull the correct torch based on the cuda version that is available at runtime (or configured via the clearml.conf)
I can't seem to find a difference between the two, why would matplotlib get listed and pandas does not... Any other package that is missing?
BTW: as an immediate "hack" , before your Task.init
call add the following:Task.add_requirements("pandas")
actually yes— task.init
is called inside of a class in one of the internal imports