Reputation
Badges 1
103 × Eureka!since it’s probably relevant—i have to use the Agg backend since the machine is headless
actually its missing imports from the second level too
okay, looks like my main issue was the errant plt.show ( 😩 ). report_media works fine without specifying set_default_upload_destination when that’s been removed 😅
this also fixed a couple other bugs i was seeing. Thanks very much to you for your help and please pass my thanks on to the team as well.
is there a limit to the search depth for this?
i’ve got a training script that imports local package files and those items import other local package files. ex:
train.py
from local_package.callbacks import Callbacks
local_package/callbacks.py
from local_package.analysis import Analysis
local_package/analysis.py
import pandas as pd
the original task only lists the following as installed packages:
clearml == 1.9.1rc0
pytorch_lightning == 1.8.6
torchvisi...
the VCS cache was empty before that run. then, even with the VCS cache being disabled in the config, there was a new lock file and directory after running.
also tried disabling the VCS cache in the config—pull this from the output of the agent’s startup output:
agent.vcs_cache.enabled = false
restarted the server on the off chance that had anything to do with it, and no. VCS is disabled, and the task is trying to pull the correct/latest commit.
further, there’s now data in the VCS cache, even though i disabled it
hmmm, i’m not creating the task in __main__ , i wonder if that’s why
yes, sorry for not catching that earlier—doesn’t seem to change anything
that seems like a good solution 🙂
thank you SuccessfulKoala55 and AgitatedDove14 for your help! Martin identified the problem early on, but I only checked my .bashrc , 😞
okay, they are somehow set as environment variables. let me figure out how they were set.
should api.credentials.access_key be the same as the access_key in clearml.conf ?
running my own clearml server with a vanilla config (obtained from github), except i have one fixed user
- stopped agent
- updated clearml.conf to have different username, wrote file
- verified the vcs-cache is empty
- started the agent, which resulted in this output
...
agent.custom_build_script =
agent.disable_task_docker_override = false
agent.git_user = aaaaaaaaaaaaa
agent.default_python = 3.9
...
(that’s the username I changed it to)
- reset and enqueued the task
checkout failed, it’s still attempting to use the old creds
❯ cat ~/clearml.conf | grep git_user
git_user: "aaaaaaaaaaaaa"
❯ cat ~/clearml.conf | grep -A 2 vcs_cache
vcs_cache: {
enabled: false,
path: ~/.clearml/vcs-cache
seems like pip 20.1.1 has the issue, but >= 22.2.2 do not.
agent version is
❯ clearml-agent --version
CLEARML-AGENT version 1.5.2
i am having this same issue—installing pytorch via pip. but i am not specifying a version, and the agent is not able to install pytorch.
even if i specify a version (e.g. torch<2.0 ), it fails.
i guess this is a pip problem, is there a known pip version that works correctly?
thanks much for your help. should have thought to check there earlier, but kind of forgot that was a thing.
but now i’m confused about why set_default_upload_destination is different from output_uri . i kind of get it? but wouldn’t that be a safe default?
2023-05-06 12:05:49,168 - clearml.Task - WARNING - ### TASK STOPPED - USER ABORTED - STATUS CHANGED ###
lol.
this changes the status in the UI to “aborted”.
not ideal, but if the answer is “for this to work, tasks must be run by an agent” i accept it
the items at the bottom of the list have dropped off—there’s no 2D hist 9 , or 2D hist 81 , etc.