Reputation
Badges 1
25 × Eureka!Very odd, I still can't reproduce. This is just the cleanup service running without anything else ?
What's the clearml version it is using ?
. Yes I do have a GOOGLE_APPLICATION_CREDENTIALS environment variable set, but nowhere do we save anything to GCS. The only usage is in the code which reads from BigQuery
Are you certain you have no artifacts on GS?
Are you saying that if GOOGLE_APPLICATION_CREDENTIALS
and clearml.conf contains no "project" section it crashed when starting ?
Edit the cloned version and enqueue it?
BTW from the log you attached:
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/storage/helper.py", line 218, in StorageHelper
_gs_configurations = GSBucketConfigurations.from_config(config.get('google.storage', {}))
This means it tries to remove an artifact from a Task, that artifact is probably in GS (i'm assuming because it is using the GS api), and the cleanup service is missing the GS configuraiton.
WackyRabbit7 is that possible ?
For example, for some of our models we create pdf reports, that we save in a folder in the NFS disk
Oh, why not as artifacts ? at least you will be able to access from the web UI, and avoid VFS credential hell 🙂
Regrading clearml datasets:
https://www.youtube.com/watch?v=S2pz9jn26uI
If you are using the latest RC:pip install clearml==0.17.5rc5
You can pass True
it will use the "files_server" as configured in your clearml.conf
I used the http link as a filler to point to the files_server.
Make sense ?
Hi NonchalantDeer14
In multi-gpu, can you still see the logs on the local Tensorboard ?
Are you running manually or with an agent ?
Hmm, I think it is this line:
WARNING - Model configuration only supports dictionary or string objects
done
Let me check something.
Hi PompousBeetle71
Try this one, let me know if it helpedlogging.getLogger('trains.frameworks').setLevel(ERROR)
PompousBeetle71 , what you are saying that for some reason the --gpus all will not configure the Nvidia drivers to use all the gpus, when running bare metal (i.e no docker). Did I understand you correctly ?
It does not upload, the default behavior is to log the artifact (so you know where you stored, but not enforce unnecessary uploads)
If you were to change:task = Task.init(project_name='examples', task_name='Keras with TensorBoard example')
to:task = Task.init(project_name='examples', task_name='Keras with TensorBoard example', output_uri="
")
It would also upload the model
Sounds good, I assumed that was the case but I was not sure.
Let's make sure that in the clearml.conf
we write it in the comment above the use_credentials_chain
option, so that when users look for IAM roles configuration they can quick search for it 🙂
(or woman or in between, we are supportive as long as code is working 🙂 )
So you want these two on two different graphs ?
PompousBeetle71 could you try trains-agent 0.15.0rc0 ? What's the OS you are using? Are you running in docker mode, if so, what's the docker version?
But functionality is working
Awesome , I will wait with the merge until tested internally .
There is a resale coming out after the weekend, once it is out I expect we will merge it.
sdk.conf will add it to the default loaded values (as I think you deduced).
can copy paste the sdk.conf here? (maybe something is missing there?)
Sadly no 😞
(I mean you could quickly write a reader for TB and report it, but it is not built into the SDK)
Okay let me check the code and comeback with followup questions
I'm glad you were able to solve the issue!
WackyRabbit7 I could not reproduce it, what did you pass in "GOOGLE_APPLICATION_CREDENTIALS" ?
Where would I put these credentials? I don't want to expose them in the logs as environmental variable or hard code them.
Hi GleamingGrasshopper63
So basically you need a vault, to store those credentials...
Unfortunately the open-source version does not contain vault support, but the paid tiers scale/enterprise do.
There you can have an environment variable defined in the vault, that each time the agent runs your code, it will pull it from the vault and set it on your process. wdyt ?
Just verifying the Pod does get allocated 2 gpus, correct ?
What do you have under the "script path" in the Task?
Should be under Profile -> Workspace (Configuration Vault)
PompousBeetle71 oh no 😞
okay this is a bit drastic, but let's see if it helps.
In your trains.conf, add the following section:loggers { loggers { trains { level: ERROR } } }