
Reputation
Badges 1
75 × Eureka!We have a training template that is a k8s job definition (yaml) that creates env variables inside the docker images that is used for tranining, and those env variables are credentials for ClearML. Since they are taken from k8s secrets, they are the same for every user.
I can create secrets for every new user and set env variables accordingly, but perhaps you see a better way out?
traceback:
` Traceback (most recent call last):
File "/home/marek/nomagic/monomagic/ml/tiresias/calibrate_and_test.py", line 57, in <module>
Task.add_requirements('requirements.txt')
File "/home/marek/.virtualenvs/tiresias-3.9/lib/python3.9/site-packages/clearml/backend_interface/task/task.py", line 1976, in add_requirements
for req in pkg_resources.parse_requirements(requirements_txt):
File "/home/marek/.virtualenvs/tiresias-3.9/lib/python3.9/site-packages/pkg_resources/_init...
thanks! is this documented? (I am wondering whether I could have avoided bothering you with my question in the first place)
but it is a guess
ok, I solved the problem,agent.force_git_ssh_protocol = true
did the trick
just put ssh config with the proper key marked
I don't see such a method in the docs, but it seems so natural that decided to ask.
I will try with sys.path.append('../../../../')
` later today and see what happens
I am only getting one user for some reason, even though 4 are in the system
ok, understood, it was probably my fault, I was messing up with the services container and probably made the pipeline task interrupted, so the subtasks themselves have finished, but the pipeline task was not alive when it happened
I circumvented the problem by putting timestamp in task name, but I don't think this is necessary.
they are universal, I thought there is some interface to them in clearml, but probably not
this is part of repository
I think there was some problem how shutil.copytree works in python3.6 with broken links
it is typically sued with pytorch
I subscribe to the problem of having large metrics without a tool for proper inspection what is it coming from.
ok, but do you know why did it try to reuse in the first place?
From the documentation https://github.com/allegroai/clearml-agent :
` Two K8s integration flavours
Spin ClearML-Agent as a long-lasting service pod
use clearml-agent docker image
map docker socket into the pod (soon replaced by podman)
allow the clearml-agent to manage sibling dockers
benefits: full use of the ClearML scheduling, no need to worry about wrong container images / lost pods etc.
downside: Sibling containers `
I created my own docker image with a newer python and the error disappeared
there is a broken symlink in the original repository
SuccessfulKoala55 that worked, thanks a lot!
it is a configuration object (line of my code:config_path = task.connect_configuration(config_path)
I did not configure user/pass for git
I did something similar to what you suggests and it worked, the key insight was that connect and connect_configuration work differently in terms of overrides, thanks!
I did not know about it, thanks!