
Reputation
Badges 1
103 × Eureka!ah, i had report_image
specified, and when i disable that, it worked.
will give it a try. thanks 🙂
the issue also may have been fixed somewhere between 20.1 and 22.2, i didn’t test versions in between those two
also tried disabling the VCS cache in the config—pull this from the output of the agent’s startup output:
agent.vcs_cache.enabled = false
but hmm, report_media
generates a file that is 0 bytes, whereas report_image
generates a 33KB file
Hmmm. Just tried cloning a brand new task and the agent is still using the expired github access token.
further, there’s now data in the VCS cache, even though i disabled it
i updated the token in ~/clearml.conf
, was careful to ensure it was only specified in one place
- stopped agent
- updated clearml.conf to have different username, wrote file
- verified the vcs-cache is empty
- started the agent, which resulted in this output
...
agent.custom_build_script =
agent.disable_task_docker_override = false
agent.git_user = aaaaaaaaaaaaa
agent.default_python = 3.9
...
(that’s the username I changed it to)
- reset and enqueued the task
checkout failed, it’s still attempting to use the old creds
getting different issues (torchvision vs. cuda compatibility, will work on that), but i’m betting that was the issue
actually yes— task.init
is called inside of a class in one of the internal imports
i don’t get why the agent init log would list the username from clearml.conf
but then use the env vars
restarted the server on the off chance that had anything to do with it, and no. VCS is disabled, and the task is trying to pull the correct/latest commit.
$ conda list | grep pandas geopandas 0.9.0 pyhd8ed1ab_1 conda-forge geopandas-base 0.9.0 pyhd8ed1ab_1 conda-forge pandas 1.3.3 py39hde0f152_0 conda-forge
okay, so if i set set_default_upload_destination
as URI that’s local to the computer running the task (and the server):
- the server is “unable to load the image”—not surprising because the filesystem URI was not mounted into the container
- the files are present at the expected location on the local filesystem, but they are…blank! all white.that tells me that
report_media
might have been successful, but there’s some issue …encoding the data to a jpeg?
okay, i have a few things on my todo list, they will take a while. we will task.init
in the entry point instead of how it’s done now, and we will re-try python -m
. if it doesn’t work, we will file an issue. if it does work, yay!
either way, thanks much for your help today, i really appreciate it.
great news! thank you! when there’s a new release, i need to docker-compose build && docker-compose up
to get the latest?
since it’s probably relevant—i have to use the Agg
backend since the machine is headless
hey Martin.B, wondering if you were able to find anything out about this?
good questions 🙂
they are plots. they have unique titles. i’m using the auto-logging mechanism—so set up the task, then plt.show()
no more than 114 plots are shown in the plots tab.
yep, that’s what i’m seeing, they’re all PNGs in that folder.
yes, i see no more than 114 plots in the list on the left side in full screen mode—just checked and the behavior exists on safari and chrome
thanks for doing that and thanks for your work on the project 🙂
i’ve just verified that they’re all writen to /opt/clearml/data/fileserver/[PROJECT_NAME]/[DESCRIPTION]/metrics
i noticed that the agent was downgrading to pip=20.1.1 at every attempt, so i added
Task.add_requirements("pip", "23.1.2")
and even then, it downgrades to 20.1.1?
okay, so my problem is actually that using a “local” package is not supported—ie i need to pip install the code i’m running and that must correctly specify its dependencies
that sounds like all good news to me! thanks for the info 🙂