Can't figure out what made it get to this point
I "think" this has something to do with loading the configuration and setting up the "StorageManager".
(in other words setting the google.storage)... Or maybe it is the lack of google storage package?!
Let me check
I'm glad you were able to solve the issue!
WackyRabbit7 I could not reproduce it, what did you pass in "GOOGLE_APPLICATION_CREDENTIALS" ?
AgitatedDove14 sorry for delayed reply - where do I read the version the Cleanup Service is using?
to fix it, I excluded this var entirely from the docker-compose
google store package could be the cause, because indeed we have the env var set, but we don't use the google storage package
I still can't get it to work... I couldn't figure out how can I change the clearml version in the runtime of the Cleanup Service as I'm not in control of the agent that executes it
Let's take a step back. Let's remove the clearml-services from the docker compose for a second, and run it manually (then you can control everything). Once you have it running manually, let's try to replicate the setup back to the docker compose, make sense ?
BTW from the log you attached:
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/storage/helper.py", line 218, in StorageHelper
_gs_configurations = GSBucketConfigurations.from_config(config.get('google.storage', {}))
This means it tries to remove an artifact from a Task, that artifact is probably in GS (i'm assuming because it is using the GS api), and the cleanup service is missing the GS configuraiton.
WackyRabbit7 is that possible ?
AgitatedDove14 I still can't get it to work... I couldn't figure out how can I change the clearml version in the runtime of the Cleanup Service as I'm not in control of the agent that executes it
Very odd, I still can't reproduce. This is just the cleanup service running without anything else ?
What's the clearml version it is using ?
AgitatedDove14
So I couldn't kill the service agent myself (permission denied, I'm not sudo). What I did is I docker-compose down
ed, commented out only the environment variable of GOOGLE_APPLICATION_CREDENTIALS
from the clearml services agent service and upped the docker-compose again. I enqueued the Cleanup Service and now it works. Really weird, looks like the setting of GOOGLE_APPLICATION_CREDENTIALS
causes an error when set even though I'm 100% is it not used for storage.
Are you certain you have no artifacts on GS?
Are you saying that ifΒ
GOOGLE_APPLICATION_CREDENTIALS
Β and clearml.conf contains no "project" section it crashed when starting ?
100% sure no artifacts are on GS. Not sure what you are asking in the second line here. The only place I have ever set GOOGLE_APPLICATION_CREDENTIALS
is as an environment variable when launching agents (on other queues, not the services queue) and on the clients only for the sake of using BigQuery
Β is the "installed packages" part editable? good to know
Of course it is, when you clone a Task everything is Editable π
Isn't it a bit risky manually changing a package version?
worst case it will crash quickly, and you reset/edit/enqueue π
(Should work though)
I'm saying that because in the task under "INSTALLED PACKAGES" this is what appears
what do you say that I will manually kill the services agent and launch one myself?
Makes sense π
I don't think the problem is setting that variable, I think it has something to do with it but not that obvious... Because it did work for me in the past, since then we docker-compose up/downed a few times, changed some other things etc... Can't figure out what made it get to this point
How can I change the version of the Cleanup Service?
to fix it, I excluded this var entirely from the docker-compose
Make sense.
the path to the JSON file
Yep, that's what I did and things seem to work... Let me check again if I missed anything
AgitatedDove14 clearml version on the Cleanup Service is 0.17.0
I'm saying that because in the task under "INSTALLED PACKAGES" this is what appears
This is exactly what I was looking for. Thanks!
Yes that makes sense, I think this bug was fixed a long time ago, and this is why I could not reproduce it.
I also think you can use a later version of clearml π
No absolutely not. Yes I do have a GOOGLE_APPLICATION_CREDENTIALS environment variable set, but nowhere do we save anything to GCS. The only usage is in the code which reads from BigQuery
In the Task log itself it will say the version of all the packages, basically I wonder maybe it is using an older clearml version, and this is why I cannot reproduce it..
I assume it has nothing to do with my client version
. Yes I do have a GOOGLE_APPLICATION_CREDENTIALS environment variable set, but nowhere do we save anything to GCS. The only usage is in the code which reads from BigQuery
Are you certain you have no artifacts on GS?
Are you saying that if GOOGLE_APPLICATION_CREDENTIALS
and clearml.conf contains no "project" section it crashed when starting ?
π€ is the "installed packages" part editable? good to know
Isn't it a bit risky manually changing a package version? what if it won't be compatible with the rest?
Let's take a step back. Let's remove the clearml-services from the docker compose for a second, and run it manually (then you can control everything). Once you have it running manually, let's try to replicate the setup back to the docker compose, make sense ?
I'd prefer not to docker-compose down
as researchers are actively working on it, what do you say that I will manually kill the services agent and launch one myself?