Reputation
Badges 1
92 × Eureka!how di you provide credentials to clearml and git ?
you should be able to explicitly upload a file of your choice as artefact using something like this: None
@<1523701070390366208:profile|CostlyOstrich36> I would like to point to azure blob storage, what kind of url schema should I use ? And also, where do you configure the credential for the ClearML server to access to Azure blob as file_server ? I couldn't find any documentation around this topic 😞
TIA
If the agent is the one running the experiment, very likely that your task will be killed.
And when the agent come back, immediately or later, probably nothing will happen. It won't resume ...
@<1523701087100473344:profile|SuccessfulKoala55> it is set to "all" as :
NV_LIBCUBLAS_VERSION=12.2.5.6-1NVIDIA_VISIBLE_DEVICES=allCLRML_API_SERVER_URL=https://<redacted>HOSTNAME=1b6a5b546a6bNVIDIA_REQUIRE_CUDA=cuda>=12.2 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=qua...
oh ... maybe the bottleneck is augmentation in CPU !
But is it normal that the agent don't detect the GPU count and type properly ?
the weird thing is that: the GPU 0 seems to be in used as reported by nvtop in the host. But it is 50% slower than when running directly instead of through the clearml-agent ...
About the caching: how does it work ? ClearML maintain it own cache and monitor if any of you code changes? Even code that get change inside an import ?
may be specific to fastai
as I cannot reproduce it with another training using yolov5
not sure if related but clearml 1.14 tend to not "show" the gpu_type
@<1523701087100473344:profile|SuccessfulKoala55> It's working !! Thank you very much !!! Clearml is awesome !!!!
I mean, what happen if I import and use function from another py file ? And that function code changes ?
Or you are expecting code should be frozen and only parameters changes between runs ?
some clearml cache folder
So if i spin up a new clearml server in the cloud and use the same file server mount point, i will see all task and expriment that i had on the in prem server in the cloud server?
Oh, I was assuming you are passing the entire DB backups to the cloud.
Yes, that is what I want to do.
So I need to migrate both the MongoDB database and elastic search database from my local docker instance to the equivalent in the cloud ?
but when I spin up a new server in the cloud, that server will have it's own mongodb and that will be empty no ?
In summary:
Spin down the local server
Backup the data folder
In the cloud, extract the data backup
Spin up the cloud server
nope, we are self-hosted in Azure
no. I set apo.file_server to the None in Both the remote agent clearml.conf and my local clearml.conf
In which case, both case where the code is ran from local or remote, will store metrics to cloud storage
@<1558986867771183104:profile|ShakyKangaroo32> If you just want something to run in regular period, have you consider TaskScheduler: None
right, in which case you want to dynamically change with your code, not with the config file. This is where the Logger.set_default_output_upload come in
If you are using multi storage place, I don't see any other choice than putting multi credential in the conf file ... Free or Paid Clearml Server ...
while the other may need to be 1
instead of true
I didn;t know that from the client side, you can specify the storage elsewhere than the clearML server. Good to know !
But I still want to know, if possible, to use a blob storage by default, configured on the ClearML server, and each client don't need to do that ...
What about migrating existing expriment in the on prem server?
@<1523701087100473344:profile|SuccessfulKoala55> Is it even possible to have the server storing file to a given blob storage ?
To "attach" that zip to the model, do you just use the update_weight and point to that zip file?
had you made sure that the agent inside GCP VM have access to your repository ? Can you ssh into that VM and try to do a git clone ?
Found it: None
And credential are set with :
sdk {
azure.storage {
containers: [
{
account_name: "account"
account_key: "xxxx"
container_name:"clearml"
}
]
}
}