AgitatedDove14
So I couldn't kill the service agent myself (permission denied, I'm not sudo). What I did is I docker-compose down
ed, commented out only the environment variable of GOOGLE_APPLICATION_CREDENTIALS
from the clearml services agent service and upped the docker-compose again. I enqueued the Cleanup Service and now it works. Really weird, looks like the setting of GOOGLE_APPLICATION_CREDENTIALS
causes an error when set even though I'm 100% is it not used for storag...
🤔 is the "installed packages" part editable? good to know
Isn't it a bit risky manually changing a package version? what if it won't be compatible with the rest?
google store package could be the cause, because indeed we have the env var set, but we don't use the google storage package
to fix it, I excluded this var entirely from the docker-compose
No absolutely not. Yes I do have a GOOGLE_APPLICATION_CREDENTIALS environment variable set, but nowhere do we save anything to GCS. The only usage is in the code which reads from BigQuery
AgitatedDove14 clearml version on the Cleanup Service is 0.17.0
I don't think the problem is setting that variable, I think it has something to do with it but not that obvious... Because it did work for me in the past, since then we docker-compose up/downed a few times, changed some other things etc... Can't figure out what made it get to this point
I assume it has nothing to do with my client version
Will try this out and report
whatttt? I looked at config_obj
didn't find any set
method
AgitatedDove14 sorry for delayed reply - where do I read the version the Cleanup Service is using?
How can I change the version of the Cleanup Service?
I was refering to what is the returned object of Task.artifacts['...']
- when I call .get
I understand what I get, I'm asking because I want to see how the object I'm calling .get
on behaves
Is there a more elegant way to find the process to kill? Right now I'm doing pgrep -af trains
but if I'll have multiples agents, I will never be able to tell them apart
Cool - what kind of objects are returned by .artifacts.
getitem
? I want to check their docs
Maybe something similar to dockers, that I could name each one of my trains agents and then refer to them by name something like
trains-agent daemon --name agent_1 ...
Thentrains-agent stop/start
I've dealt with this earlier today because I set up 2 agents, one for each GPU on a machine, and after editing configurations I wanted to restart only one of them (because the other was working) and then I noticed I don't know which one to kill
I jsut think that if I use "report_table" I might as well be able to download it as CSV or something
Also being able to separate their configurations files would be good (maybe there is and I don't know?)
I'm not, just want to be very precise an consice about them when I do ask... but bear with me, its coming 🙂
So could you re-explain assuming my piepline object is created by pipeline = PipelineController(...)
?