Reputation
Badges 1
103 × Eureka!thanks for that tip. i cleared out the vcs cache and was already using the latest version of the agent, same problem persists.
there’s a python version mismatch, i will make a different env for the agent to run in that has a matching python version
that sounds like all good news to me! thanks for the info 🙂
getting different issues (torchvision vs. cuda compatibility, will work on that), but i’m betting that was the issue
agent version is
❯ clearml-agent --version
CLEARML-AGENT version 1.5.2
okay, that’s a fresh install, and the backend is agg:
` Python 3.8.8 (default, Feb 24 2021, 21:46:12)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
import matplotlib
matplotlib.get_backend()
'agg' `
the machine is headless, and there’s no window server running.
so now i have
git_pass: "[NEW KEY]"
enable_git_ask_pass: false
in my clearml.conf file
in the main script, these are the first imports:import argparse import time import json import pytorch_lightning as pl from pytorch_lightning.accelerators import accelerator
then after that we import stuff from the repo, and the listed packages are imported in those files
if i have code that’s just in a git repo but is not installed in any way, it runs fine if i invoke the entrypoint in a shell. but clearml will not find dependencies in secondary imports (as described above) if the agent someone just clones the repo but does not install the python package in some way.
hmmm, i’m not creating the task in __main__
, i wonder if that’s why
the issue also may have been fixed somewhere between 20.1 and 22.2, i didn’t test versions in between those two
i updated the token in ~/clearml.conf
, was careful to ensure it was only specified in one place
okay, so my problem is actually that using a “local” package is not supported—ie i need to pip install the code i’m running and that must correctly specify its dependencies
actually its missing imports from the second level too
the VCS cache was empty before that run. then, even with the VCS cache being disabled in the config, there was a new lock file and directory after running.
great news! thank you! when there’s a new release, i need to docker-compose build && docker-compose up
to get the latest?
but now i’m confused about why set_default_upload_destination
is different from output_uri
. i kind of get it? but wouldn’t that be a safe default?
ugh, turns out i had a plt.show()
in there, that was causing blank figs.
that said, report_matplotlib_figure
did not end up putting anything into “plots” or “debug samples”
also tried disabling the VCS cache in the config—pull this from the output of the agent’s startup output:
agent.vcs_cache.enabled = false
ah, i had report_image
specified, and when i disable that, it worked.
but hmm, report_media
generates a file that is 0 bytes, whereas report_image
generates a 33KB file
hmmm, looks like maybe i should set it directly: https://clearml.slack.com/archives/CTK20V944/p1603369102359500?thread_ts=1603362214.350500&cid=CTK20V944
yep, that was it. thanks for all your help and sorry to bother 🙂