Reputation
Badges 1
103 × Eureka!correct, i’m just running the task via CLI
but, the call used to start the script was python -m module.name --args
yep, that’s what i’m seeing, they’re all PNGs in that folder.
yes, i see no more than 114 plots in the list on the left side in full screen mode—just checked and the behavior exists on safari and chrome
i’ve just verified that they’re all writen to /opt/clearml/data/fileserver/[PROJECT_NAME]/[DESCRIPTION]/metrics
okay, looks like my main issue was the errant plt.show ( 😩 ). report_media works fine without specifying set_default_upload_destination when that’s been removed 😅
good questions 🙂
they are plots. they have unique titles. i’m using the auto-logging mechanism—so set up the task, then plt.show() no more than 114 plots are shown in the plots tab.
also tried disabling the VCS cache in the config—pull this from the output of the agent’s startup output:
agent.vcs_cache.enabled = false
yeah, it’s in one of the imports from the repo
$ conda list | grep matplotlib matplotlib 3.4.3 py39hf3d152e_1 conda-forge matplotlib-base 3.4.3 py39h2fa2bec_1 conda-forge
i am having this same issue—installing pytorch via pip. but i am not specifying a version, and the agent is not able to install pytorch.
even if i specify a version (e.g. torch<2.0 ), it fails.
i guess this is a pip problem, is there a known pip version that works correctly?
ah, i had report_image specified, and when i disable that, it worked.
except for the IP address and the actual keys, it’s the vanilla config generated by clearml-agent init
i appreciate your help today. it can’t be very fun working on a sunday. i hope you get some relax time away from the computer today, and look forward to hearing more when you are working.
i did want to point out, though, that when manually reporting, it looks like the plots don’t get “cleared” properly: https://demoapp.demo.clear.ml/projects/52eb5c9d938244daaa6fa460edce5e22/experiments/78fa65250e0544d7b50425a82dde75f5/info-output/metrics/plots?columns=selected&columns=type&columns=name&colu...
don’t want to pester, but i am curious—did they have some thoughts on what was happening? should i make a feature request somewhere?
$ conda list | grep pandas geopandas 0.9.0 pyhd8ed1ab_1 conda-forge geopandas-base 0.9.0 pyhd8ed1ab_1 conda-forge pandas 1.3.3 py39hde0f152_0 conda-forge
will try the git ask pass thing.
getting different issues (torchvision vs. cuda compatibility, will work on that), but i’m betting that was the issue
i’ll clone and enqueue, but i’m guessing that’s the issue
okay, i have a few things on my todo list, they will take a while. we will task.init in the entry point instead of how it’s done now, and we will re-try python -m . if it doesn’t work, we will file an issue. if it does work, yay!
either way, thanks much for your help today, i really appreciate it.
in the main script, these are the first imports:import argparse import time import json import pytorch_lightning as pl from pytorch_lightning.accelerators import acceleratorthen after that we import stuff from the repo, and the listed packages are imported in those files
yep, that was it. thanks for all your help and sorry to bother 🙂
since it’s probably relevant—i have to use the Agg backend since the machine is headless