
Reputation
Badges 1
103 × Eureka!is there a limit to the search depth for this?
i’ve got a training script that imports local package files and those items import other local package files. ex:
train.py
from local_package.callbacks import Callbacks
local_package/callbacks.py
from local_package.analysis import Analysis
local_package/analysis.py
import pandas as pd
the original task only lists the following as installed packages:
clearml == 1.9.1rc0
pytorch_lightning == 1.8.6
torchvisi...
if i have code that’s just in a git repo but is not installed in any way, it runs fine if i invoke the entrypoint in a shell. but clearml will not find dependencies in secondary imports (as described above) if the agent someone just clones the repo but does not install the python package in some way.
okay, so my problem is actually that using a “local” package is not supported—ie i need to pip install the code i’m running and that must correctly specify its dependencies
actually its missing imports from the second level too
the issue also may have been fixed somewhere between 20.1 and 22.2, i didn’t test versions in between those two
thanks for doing that and thanks for your work on the project 🙂
yes, that call appeared to be successful—had to wrap in quotes because of the contents of the key:$ curl -u 'J9*****':'R2*****'
`
{"meta":{"id":"6db9ae72249f417fa2b6b8705b44f38a","trx":"6db9ae72249f417fa2b6b8705b44f38a","endpoint":{"name":"users.get_current_user","requested_version":"2.13","actual_version":"1.0"},"result_code":200,"result_subcode":0,"result_msg":"OK","error_stack":null,"error_data":{}},"data":{"user":{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b","name":"trains"},...
hmm, it was confusing to me, but it’s kind of an edge case where I was taking over a computer after a colleague left, seems like that might not be a common scenario
looks like a previous user set CLEARML_API_ACCESS_KEY
and CLEARML_API_SECRET_KEY
in /etc/environment
and then disabled the keys in the web app. I removed the two items from /etc/environment
and was able to successfully start a worker.
it seems, though, that the env vars take precedence even when a --config-file
is explicitly specified?
here’s a the file with the keys and IP redacted: https://clearml.slack.com/files/U01PN0S6Y67/F0231N0GZ19/clearml.conf
so now i have
git_pass: "[NEW KEY]"
enable_git_ask_pass: false
in my clearml.conf file
weird. will move forward with manually recreating the task.
yes. had to sanitize it a bit, but left the git username/key intact (since the key is invalid now)
sorry for the delay, had work and personal emergencies 😕
the VCS cache was empty before that run. then, even with the VCS cache being disabled in the config, there was a new lock file and directory after running.
thanks much for your help. should have thought to check there earlier, but kind of forgot that was a thing.
further, there’s now data in the VCS cache, even though i disabled it
Hmmm. Just tried cloning a brand new task and the agent is still using the expired github access token.
- stopped agent
- updated clearml.conf to have different username, wrote file
- verified the vcs-cache is empty
- started the agent, which resulted in this output
...
agent.custom_build_script =
agent.disable_task_docker_override = false
agent.git_user = aaaaaaaaaaaaa
agent.default_python = 3.9
...
(that’s the username I changed it to)
- reset and enqueued the task
checkout failed, it’s still attempting to use the old creds
i don’t get why the agent init log would list the username from clearml.conf
but then use the env vars
i updated the token in ~/clearml.conf
, was careful to ensure it was only specified in one place
if i run clearml-agent daemon
that reads from ~/clearml.conf
, right?
will try the git ask pass thing.
thanks for that tip. i cleared out the vcs cache and was already using the latest version of the agent, same problem persists.
there’s a python version mismatch, i will make a different env for the agent to run in that has a matching python version
yes—am running the agent on a workstation. am sshed into that workstation and verified the change in the conf by explicitly disabling the VCS cache and then looking for that in the agent’s startup output
restarted the server on the off chance that had anything to do with it, and no. VCS is disabled, and the task is trying to pull the correct/latest commit.
❯ cat ~/clearml.conf | grep git_user
git_user: "aaaaaaaaaaaaa"
❯ cat ~/clearml.conf | grep -A 2 vcs_cache
vcs_cache: {
enabled: false,
path: ~/.clearml/vcs-cache
i meant I should have thought to check there earlier! anyway, thanks again for your attention and help! 🙂