
Reputation
Badges 1
74 × Eureka!@<1523701435869433856:profile|SmugDolphin23> I have checked that when setting auto_connect_frameworks=False it works, but disabling just joblib is not enough.
@<1523701435869433856:profile|SmugDolphin23> None
@<1523701087100473344:profile|SuccessfulKoala55> FYI
@<1523701087100473344:profile|SuccessfulKoala55> I am using it as follows:
after calling clearml.Task.init()
I create an object:
cache = Cache('/scidata/marek/diskcache')
and then in the loading function I do:
if cache_arg in load_and_crop.cache:
return load_and_crop.cache[cache_arg] ...
@<1523701435869433856:profile|SmugDolphin23> it did not help, shall I create smallest example when it does not work and paste it here?
@<1523701087100473344:profile|SuccessfulKoala55> any ideas what can be the cause?
to avoid loading and cropping a big image
@<1523701435869433856:profile|SmugDolphin23> it took some time, but I was able to cut 90% of the code, just dataloading remains and the problem persists (which is fortunate, as it makes it easy to replicate). Please have a look.
@<1523701435869433856:profile|SmugDolphin23> will send later today
I am only getting one user for some reason, even though 4 are in the system
@<1523701435869433856:profile|SmugDolphin23> let me know if you need any help in reproducing
The problem started appearing when I started to use joblib
with a simple memory caching mechanism.
ok, understood, it was probably my fault, I was messing up with the services container and probably made the pipeline task interrupted, so the subtasks themselves have finished, but the pipeline task was not alive when it happened
I circumvented the problem by putting timestamp in task name, but I don't think this is necessary.
ok, but do you know why did it try to reuse in the first place?
traceback:
` Traceback (most recent call last):
File "/home/marek/nomagic/monomagic/ml/tiresias/calibrate_and_test.py", line 57, in <module>
Task.add_requirements('requirements.txt')
File "/home/marek/.virtualenvs/tiresias-3.9/lib/python3.9/site-packages/clearml/backend_interface/task/task.py", line 1976, in add_requirements
for req in pkg_resources.parse_requirements(requirements_txt):
File "/home/marek/.virtualenvs/tiresias-3.9/lib/python3.9/site-packages/pkg_resources/_init...
I will try with sys.path.append('../../../../')
` later today and see what happens
this is part of repository
We have a training template that is a k8s job definition (yaml) that creates env variables inside the docker images that is used for tranining, and those env variables are credentials for ClearML. Since they are taken from k8s secrets, they are the same for every user.
I can create secrets for every new user and set env variables accordingly, but perhaps you see a better way out?
and in the future I do want to have an Agent on the k8s cluster, but then this should not be a problem I guess as the user is set during Task.init
, right?
@<1523701087100473344:profile|SuccessfulKoala55> I have the same problem with diskcache
I am seeing such warnings clearml.model - WARNING - 9 model found when searching
where is the endpoint located? I can't find it, were only able to find this:
https://github.com/allegroai/clearml/blob/ccc8e83c58336928424ed14b176306b149258512/examples/services/monitoring/slack_alerts.py#L55
task.data.user is the user id, can I get it in the text form?
SuccessfulKoala55 that worked, thanks a lot!
it is a configuration object (line of my code:config_path = task.connect_configuration(config_path)
ok, I will do a simple workaround for this (use an additional parameter that I can update using parameter_override and then check if it exists and update the configuration in python myself)
I did not know about it, thanks!
which is probably why it does not work for me, right?
I did something similar to what you suggests and it worked, the key insight was that connect and connect_configuration work differently in terms of overrides, thanks!