
Reputation
Badges 1
103 × Eureka!ah, my mistake, that’s an issue in my conf file.
okay, i have a few things on my todo list, they will take a while. we will task.init
in the entry point instead of how it’s done now, and we will re-try python -m
. if it doesn’t work, we will file an issue. if it does work, yay!
either way, thanks much for your help today, i really appreciate it.
(also, the training code, which uses pandas, worked)
seems like pip 20.1.1 has the issue, but >= 22.2.2 do not.
actually yes— task.init
is called inside of a class in one of the internal imports
the VCS cache was empty before that run. then, even with the VCS cache being disabled in the config, there was a new lock file and directory after running.
i’ve just verified that they’re all writen to /opt/clearml/data/fileserver/[PROJECT_NAME]/[DESCRIPTION]/metrics
should be posted in the “uncommitted changes” section 🙂
don’t want to pester, but i am curious—did they have some thoughts on what was happening? should i make a feature request somewhere?
running my own clearml
server with a vanilla config (obtained from github), except i have one fixed user
$ conda list | grep matplotlib matplotlib 3.4.3 py39hf3d152e_1 conda-forge matplotlib-base 3.4.3 py39h2fa2bec_1 conda-forge
hmmm, looks like maybe i should set it directly: https://clearml.slack.com/archives/CTK20V944/p1603369102359500?thread_ts=1603362214.350500&cid=CTK20V944
restarted the server on the off chance that had anything to do with it, and no. VCS is disabled, and the task is trying to pull the correct/latest commit.
but now i’m confused about why set_default_upload_destination
is different from output_uri
. i kind of get it? but wouldn’t that be a safe default?
i tried lots of things, but values in the conf file (specifically the pip and cuda versions) overriding things in my code/env confused me for a long time
hey Martin.B, wondering if you were able to find anything out about this?
okay, that’s a fresh install, and the backend is agg:
` Python 3.8.8 (default, Feb 24 2021, 21:46:12)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
import matplotlib
matplotlib.get_backend()
'agg' `
the machine is headless, and there’s no window server running.
ah, i had report_image
specified, and when i disable that, it worked.
i am having this same issue—installing pytorch via pip. but i am not specifying a version, and the agent is not able to install pytorch.
even if i specify a version (e.g. torch<2.0
), it fails.
i guess this is a pip problem, is there a known pip version that works correctly?
ugh, turns out i had a plt.show()
in there, that was causing blank figs.
that said, report_matplotlib_figure
did not end up putting anything into “plots” or “debug samples”
i’ll clone and enqueue, but i’m guessing that’s the issue
the issue also may have been fixed somewhere between 20.1 and 22.2, i didn’t test versions in between those two
but, the call used to start the script was python -m module.name --args
that sounds like all good news to me! thanks for the info 🙂
okay, so here’s what i found out—
calling the training entry point directly (eg /path/to/train.py
), and not instantiating the clearml Task in train.py
(eg calling a method in a different module where the task is instantiated) does work calling the entrypoint with python -m
, but instantiating the clearml Task within train.py
also works
so the only thing that doesn’t work is calling the entrypoint with python -m
and calling a method from a different module that ...
weird. will move forward with manually recreating the task.
okay, so if i set set_default_upload_destination
as URI that’s local to the computer running the task (and the server):
- the server is “unable to load the image”—not surprising because the filesystem URI was not mounted into the container
- the files are present at the expected location on the local filesystem, but they are…blank! all white.that tells me that
report_media
might have been successful, but there’s some issue …encoding the data to a jpeg?
❯ cat ~/clearml.conf | grep git_user
git_user: "aaaaaaaaaaaaa"
❯ cat ~/clearml.conf | grep -A 2 vcs_cache
vcs_cache: {
enabled: false,
path: ~/.clearml/vcs-cache