Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All—First Off, Thanks For Being Such A Helpful And Thorough Group Of People. I Learn A Ton Just Searching Through The Channel For Problems. I’M Seeing A Weird Issue. I Have A Conda Env On My Linux Machine, And I Can Successfully Run A Training Script

hi all—first off, thanks for being such a helpful and thorough group of people. i learn a ton just searching through the channel for problems.

i’m seeing a weird issue. I have a conda env on my linux machine, and i can successfully run a training script in that env that connects to our clearML server and stores results.

i have a clearml agent running on a different linux machine that connects to our clearml server. If i clone the task that results from the above process and enqueue it, that task fails because it fails to build the environment properly. weirdly, for all the testing i’ve done on this, it just doesn’t install pandas for some reason.

i’ve tried different python versions (in the env and for the agent), and though i have to install packages from both conda and pip, i’ve made sure i’m installing pandas using conda. not really sure what my next debugging step would be.

  
  
Posted 3 years ago
Votes Newest

Answers 30


that must have been it. here’s the installed packages when not using -m :
` # Python 3.9.7 | packaged by conda-forge | (default, Sep 23 2021, 07:28:37) [GCC 9.4.0]

Local modules found - skipping:

modulename == ../pathto/modulename/init.py

PyYAML == 5.4.1
Shapely == 1.7.1
clearml == 1.1.1
click == 7.1.2
matplotlib == 3.4.3
numpy == 1.21.2
pandas == 1.3.3
python_dateutil == 2.8.2
pytorch_lightning == 1.4.8
pytz == 2021.1
rasterio == 1.2.8
scikit_image == 0.18.3
scikit_learn == 0.24.2
scipy == 1.7.1
tensorboard == 2.6.0
torch == 1.9.1
torchvision == 0.2.2
tqdm == 4.62.3 `

  
  
Posted 3 years ago

yeah, it’s in one of the imports from the repo

  
  
Posted 3 years ago

actually yes— task.init is called inside of a class in one of the internal imports

  
  
Posted 3 years ago

(also, the training code, which uses pandas, worked)

  
  
Posted 3 years ago

$ conda list | grep matplotlib matplotlib 3.4.3 py39hf3d152e_1 conda-forge matplotlib-base 3.4.3 py39h2fa2bec_1 conda-forge

  
  
Posted 3 years ago

Could it be pandas was not installed on the local machine ?

  
  
Posted 3 years ago

BTW: could it be the Task.init is Not called on the "module.name" entry point, but somewhere internally ?

  
  
Posted 3 years ago

BTW: how is it missing listing torch ? Do you have "import torch" in the code ?

  
  
Posted 3 years ago

Sounds great! let me know what you find out 🙂

  
  
Posted 3 years ago

and it’s in the “installed packages” from the child task:
absl-py==0.14.0 aiohttp==3.7.4.post0 async-timeout==3.0.1 attrs==21.2.0 cachetools==4.2.2 certifi==2021.5.30 chardet==4.0.0 charset-normalizer==2.0.6 clearml==1.1.1 cycler==0.10.0 Cython==0.29.24 fsspec==2021.9.0 furl==2.1.2 future==0.18.2 google-auth==1.35.0 google-auth-oauthlib==0.4.6 grpcio==1.40.0 idna==3.2 joblib==1.0.1 jsonschema==3.2.0 kiwisolver==1.3.2 Markdown==3.3.4 matplotlib==3.4.3 multidict==5.1.0 numpy==1.21.2 oauthlib==3.1.1 orderedmultidict==1.0.1 packaging==21.0 pathlib2==2.3.6 Pillow==8.3.2 protobuf==3.18.0 psutil==5.8.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pyDeprecate==0.3.1 PyJWT==2.1.0 pyparsing==2.4.7 pyrsistent==0.18.0 python-dateutil==2.8.2 pytorch-lightning==1.4.8 PyYAML==5.4.1 requests==2.26.0 requests-oauthlib==1.3.0 rsa==4.7.2 scikit-learn==0.24.2 scipy==1.7.1 six==1.16.0 tensorboard==2.6.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.0 threadpoolctl==2.2.0 torch==1.9.1 torchmetrics==0.5.1 tqdm==4.62.3 typing-extensions==3.10.0.2 urllib3==1.26.7 Werkzeug==2.0.1 yarl==1.6.3

  
  
Posted 3 years ago

we do use all those packages, and the version numbers are correct

  
  
Posted 3 years ago

but, the call used to start the script was python -m module.name --args

  
  
Posted 3 years ago

getting different issues (torchvision vs. cuda compatibility, will work on that), but i’m betting that was the issue

  
  
Posted 3 years ago

I think the main issue is running with python -m module.name --args
Which is a bit different, when trying to "understand" what is the actual repository.
Can you try to run it from the repository folder (same command, just to see if it will have any effect on the detected packages)

  
  
Posted 3 years ago

okay, i have a few things on my todo list, they will take a while. we will task.init in the entry point instead of how it’s done now, and we will re-try python -m . if it doesn’t work, we will file an issue. if it does work, yay!

either way, thanks much for your help today, i really appreciate it.

  
  
Posted 3 years ago

and it’s in the “installed packages” from the child task:

This is because the agent always updates back the full venv setup, so you will be able to always reproduce the entire thing (as opposed to dev time, where it lists only the directly imported packages)

  
  
Posted 3 years ago

pip freeze | grep pandas

  
  
Posted 3 years ago

in the main script, these are the first imports:
import argparse import time import json import pytorch_lightning as pl from pytorch_lightning.accelerators import acceleratorthen after that we import stuff from the repo, and the listed packages are imported in those files

  
  
Posted 3 years ago

actually no

  
  
Posted 3 years ago

i’ll clone and enqueue, but i’m guessing that’s the issue

  
  
Posted 3 years ago

that must have been it. here’s the installed packages when not using 

-m

:

Hmm yes, can you open a GitHub issue on that? (this seems like a bug)

  
  
Posted 3 years ago

i checked the exact same thing

  
  
Posted 3 years ago

actually no

hmm, are those packages correct ?

  
  
Posted 3 years ago

not sure if this is considered a bug or not! but I’d happily make an issue on github if needed.

I think we should, at least for the sake of transparency and visibility 🙂

thanks again for all your help.

My pleasure 🙂

  
  
Posted 2 years ago

I can't seem to find a difference between the two, why would matplotlib get listed and pandas does not... Any other package that is missing?
BTW: as an immediate "hack" , before your Task.init call add the following:
Task.add_requirements("pandas")

  
  
Posted 3 years ago

conda list | grep matplotlib ?

  
  
Posted 3 years ago

$ conda list | grep pandas geopandas 0.9.0 pyhd8ed1ab_1 conda-forge geopandas-base 0.9.0 pyhd8ed1ab_1 conda-forge pandas 1.3.3 py39hde0f152_0 conda-forge

  
  
Posted 3 years ago

$ pip freeze | grep pandas geopandas @ file:///home/conda/feedstock_root/build_artifacts/geopandas_1623249625470/work pandas==1.3.3

  
  
Posted 3 years ago

(torchvision vs. cuda compatibility, will work on that),

The agent will pull the correct torch based on the cuda version that is available at runtime (or configured via the clearml.conf)

  
  
Posted 3 years ago

okay, so here’s what i found out—
calling the training entry point directly (eg /path/to/train.py ), and not instantiating the clearml Task in train.py (eg calling a method in a different module where the task is instantiated) does work calling the entrypoint with python -m , but instantiating the clearml Task within train.py also works
so the only thing that doesn’t work is calling the entrypoint with python -m and calling a method from a different module that instantiates the task.

not sure if this is considered a bug or not! but I’d happily make an issue on github if needed.

thanks again for all your help.

  
  
Posted 2 years ago
877 Views
30 Answers
3 years ago
one year ago
Tags
Similar posts