Reputation
Badges 1
88 × Eureka!we are not using docker compose. We are deploying in Azure with each database as a standalone service
We don't have a file server. The clearml conf have :sdk.development.default_output_uri="
None "
from what I understand, the docker mode were designed for apt
based image and also running as root
inside the container.
We have container that are not apt
based and running not as root
We also do some "start up" that fetch credentials from Key Vault prior running the agent
While creating the autoscaler instance I did provide my git credentials, i.e my username and Personal Access Token.
How exactly did you do that ?
So we have 3 python package, store in github.com
On the dev machine, the datascientist (DS) will add the local ssh key to his github account as authorized ssh keys, account level.
With the DS can run git clone git@github.com:org/repo1
then install that python package via pip install -e .
Do that for all 3 python packages, each in its own repo1
, repo2
and repo3
. All 3 can be clone using the same key that the DS added to his account.
The DS run a tra...
I mean, what happen if I import and use function from another py file ? And that function code changes ?
Or you are expecting code should be frozen and only parameters changes between runs ?
I also have the same issue. Default argument are fine but all supplied argument in command line become duplicated !
ok, so if git commit or uncommit changes differ from previous run, then the cache is "invalidated" and the step will be run again ?
most of the time, "user" would expect that clearml handle the caching by itself
most of people probable wont even know what that do
@<1523701087100473344:profile|SuccessfulKoala55> I can confirm that v1.8.1rc2 fixed the issue in our case. I manage to reproduce it:
- Do a local commit without pushing
- Create task and queue it
- The queue task failed as expected as the commit is only local
- Push your local commit
- Requeue the task
- Expecting that the task succeeed as the commit is avail: but it fails as the vcs seems to be in weird state from previous failure
- Now with v1.8.1rc2 the issue is solved
Task.export_task() will contains what you are looking for.
In this case ['script']['diff']
how did you deploy your clearml server ?
@<1523701087100473344:profile|SuccessfulKoala55> it is set to "all" as :
NV_LIBCUBLAS_VERSION=12.2.5.6-1NVIDIA_VISIBLE_DEVICES=allCLRML_API_SERVER_URL=https://<redacted>HOSTNAME=1b6a5b546a6bNVIDIA_REQUIRE_CUDA=cuda>=12.2 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=qua...
@<1523701087100473344:profile|SuccessfulKoala55> Should I raise a github issue ?
Can you share the agent log, in the console tab, before the error?
@<1523701070390366208:profile|CostlyOstrich36>
Yes. I am investigating that route now.
Are the uncommit changes in un-tracked files ?
In other words: clearml will only save uncommited changes from files that are tracked by your local git repo
I can only guess with little information here. You better try to debug with print statement. Is this happening in submodule uncommited changes ?
I don't use submodule so don't really know how that behave with ClearML
I don;t think ClearML is designed to handle secrets other than git and storage ...
please provide the full logs and error message.
is this mongodb type of filtering?
You can either set your user permission to allow group write by default ?
Or maybe create a dedicated user with group write permission and run the agent with that user ?
Based on this : it feels like S3 is supported
How are you using the function update_output_model
?