Reputation
Badges 1
103 × Eureka!in the main script, these are the first imports:import argparse import time import json import pytorch_lightning as pl from pytorch_lightning.accelerators import accelerator
then after that we import stuff from the repo, and the listed packages are imported in those files
but, the call used to start the script was python -m module.name --args
actually yes— task.init
is called inside of a class in one of the internal imports
(also, the training code, which uses pandas, worked)
okay, so here’s what i found out—
calling the training entry point directly (eg /path/to/train.py
), and not instantiating the clearml Task in train.py
(eg calling a method in a different module where the task is instantiated) does work calling the entrypoint with python -m
, but instantiating the clearml Task within train.py
also works
so the only thing that doesn’t work is calling the entrypoint with python -m
and calling a method from a different module that ...
getting different issues (torchvision vs. cuda compatibility, will work on that), but i’m betting that was the issue
$ conda list | grep matplotlib matplotlib 3.4.3 py39hf3d152e_1 conda-forge matplotlib-base 3.4.3 py39h2fa2bec_1 conda-forge
$ pip freeze | grep pandas geopandas @ file:///home/conda/feedstock_root/build_artifacts/geopandas_1623249625470/work pandas==1.3.3
yeah, it’s in one of the imports from the repo
and it’s in the “installed packages” from the child task:
` absl-py==0.14.0
aiohttp==3.7.4.post0
async-timeout==3.0.1
attrs==21.2.0
cachetools==4.2.2
certifi==2021.5.30
chardet==4.0.0
charset-normalizer==2.0.6
clearml==1.1.1
cycler==0.10.0
Cython==0.29.24
fsspec==2021.9.0
furl==2.1.2
future==0.18.2
google-auth==1.35.0
google-auth-oauthlib==0.4.6
grpcio==1.40.0
idna==3.2
joblib==1.0.1
jsonschema==3.2.0
kiwisolver==1.3.2
Markdown==3.3.4
matplotlib==3.4.3
multidict==5.1.0
numpy==1.21.2
oauthlib=...
running my own clearml
server with a vanilla config (obtained from github), except i have one fixed user
great news! thank you! when there’s a new release, i need to docker-compose build && docker-compose up
to get the latest?
yep, that’s what i’m seeing, they’re all PNGs in that folder.
yes, i see no more than 114 plots in the list on the left side in full screen mode—just checked and the behavior exists on safari and chrome
since it’s probably relevant—i have to use the Agg
backend since the machine is headless
wondering if there has been an update on this?
2023-05-06 12:05:49,168 - clearml.Task - WARNING - ### TASK STOPPED - USER ABORTED - STATUS CHANGED ###
lol.
this changes the status in the UI to “aborted”.
not ideal, but if the answer is “for this to work, tasks must be run by an agent” i accept it
hi SubstantialElk6 , not sure if you were successful on this but i struggled with it as well, and it looks like the information is not in the linked document anymore.
in the end i realized that i needed to download apiserver.conf
from the clearml-server
repo ( https://github.com/allegroai/clearml-server/blob/master/apiserver/config/default/apiserver.conf ) and then add a user/pass for myself (starting at line 82).
i’ve just verified that they’re all writen to /opt/clearml/data/fileserver/[PROJECT_NAME]/[DESCRIPTION]/metrics
good luck! thanks for looking into it 🙂
great, thank you very much for this info!
the items at the bottom of the list have dropped off—there’s no 2D hist 9
, or 2D hist 81
, etc.
$ conda list | grep pandas geopandas 0.9.0 pyhd8ed1ab_1 conda-forge geopandas-base 0.9.0 pyhd8ed1ab_1 conda-forge pandas 1.3.3 py39hde0f152_0 conda-forge
hmmm, i’m not creating the task in __main__
, i wonder if that’s why
i’ll clone and enqueue, but i’m guessing that’s the issue
we do use all those packages, and the version numbers are correct
correct, i’m just running the task via CLI
don’t want to pester, but i am curious—did they have some thoughts on what was happening? should i make a feature request somewhere?