Reputation
Badges 1
103 × Eureka!should api.credentials.access_key
be the same as the access_key
in clearml.conf
?
hmm, it was confusing to me, but it’s kind of an edge case where I was taking over a computer after a colleague left, seems like that might not be a common scenario
also, i’m noticing the “last used” field does not update when I try to start an agent, but does change when I issue the curl
command you gave earlier
yep, that was it. thanks for all your help and sorry to bother 🙂
should be posted in the “uncommitted changes” section 🙂
if you’re able to check the data store, folders for all 120 plots will be on disk.
🤔
Media is uploaded to a preconfigured bucket (see setup_upload()) with a key (filename) describing the task ID, title, series and iteration.
but there was a pip_version: “<20.2” line in my
clearml.conf` , which would possibly have been a default in the config file like, 2 years ago or something
seems like pip 20.1.1 has the issue, but >= 22.2.2 do not.
i am having this same issue—installing pytorch via pip. but i am not specifying a version, and the agent is not able to install pytorch.
even if i specify a version (e.g. torch<2.0
), it fails.
i guess this is a pip problem, is there a known pip version that works correctly?
the issue also may have been fixed somewhere between 20.1 and 22.2, i didn’t test versions in between those two
ah, my mistake, that’s an issue in my conf file.
if i have code that’s just in a git repo but is not installed in any way, it runs fine if i invoke the entrypoint in a shell. but clearml will not find dependencies in secondary imports (as described above) if the agent someone just clones the repo but does not install the python package in some way.
i noticed that the agent was downgrading to pip=20.1.1 at every attempt, so i added
Task.add_requirements("pip", "23.1.2")
and even then, it downgrades to 20.1.1?
okay, so my problem is actually that using a “local” package is not supported—ie i need to pip install the code i’m running and that must correctly specify its dependencies
i tried lots of things, but values in the conf file (specifically the pip and cuda versions) overriding things in my code/env confused me for a long time
agent version is
❯ clearml-agent --version
CLEARML-AGENT version 1.5.2
is there a limit to the search depth for this?
i’ve got a training script that imports local package files and those items import other local package files. ex:
train.py
from local_package.callbacks import Callbacks
local_package/callbacks.py
from local_package.analysis import Analysis
local_package/analysis.py
import pandas as pd
the original task only lists the following as installed packages:
clearml == 1.9.1rc0
pytorch_lightning == 1.8.6
torchvisi...
but now i’m confused about why set_default_upload_destination
is different from output_uri
. i kind of get it? but wouldn’t that be a safe default?
ah, i had report_image
specified, and when i disable that, it worked.
okay, looks like my main issue was the errant plt.show
( 😩 ). report_media
works fine without specifying set_default_upload_destination
when that’s been removed 😅
okay, so if i set set_default_upload_destination
as URI that’s local to the computer running the task (and the server):
- the server is “unable to load the image”—not surprising because the filesystem URI was not mounted into the container
- the files are present at the expected location on the local filesystem, but they are…blank! all white.that tells me that
report_media
might have been successful, but there’s some issue …encoding the data to a jpeg?
ugh, turns out i had a plt.show()
in there, that was causing blank figs.
that said, report_matplotlib_figure
did not end up putting anything into “plots” or “debug samples”
actually its missing imports from the second level too
but hmm, report_media
generates a file that is 0 bytes, whereas report_image
generates a 33KB file