I didn't save it in any way. I relied on the auto-save from Clearml
Are you sure the files server is correctly configured on the pods ?
Ok, guys, I done it, by manually uploading model.task = Task.init(project_name='test', task_name='PyTorch MNIST train filserver dataset')
output_model = OutputModel(task=task, framework="PyTorch")
output_model.set_upload_destination(uri="
None ")
tmp_dir = os.path.join(gettempdir(), "
mnist_cnn.pt ")
torch.save(model.state_dict(), tmp_dir)
output_model.update_weights(weights_filename=tmp_dir)
Pod easily can download dataset, upload to fileserver logs, but can't upload model 😀
So when you do torch.save()
it doesn't save the model?
Ok, I found out that using scikit-learn the model is uploading, but pytorch doesn't.
Ok, maybe someone knows: how does a pod created by a K8s agent know the model registry URL? When I added the output_uri parameter in the Task, like output_uri=" None ", it doesn't show anything now. Previously, without this parameter, it showed a path like " None ...." in WebUI->Experiments->Artifacts
Hi @<1742355077231808512:profile|DisturbedLizard6> , not sure I get that, did you use torch.save
(like in here ) or some other command to save the models? When running with the clearml-agent.
you have a print of all the configurations at the beginning of the log, can you verify your values are as you configure it?
Additionally, which version of clearml
, clearml-agent
and torch
are you using?
Hi @<1742355077231808512:profile|DisturbedLizard6> , you can use the output_uri
parameter of Task.init()
to specify where to upload models.
None
I'm currently unsure about the correct approach. Would you kindly review my attempts and point out where I might have made a mistake? Here's what I've tried:
- I've added the default url in agent helm chart
clearml:
...
clearmlConfig: |-
sdk {
development {
default_output_uri: "
"
}
}
- I've added url in agent section:
agentk8sglue:
...
fileServerUrlReference:
- In the Python file, when using Task.init, I've tried the 'output_uri' key argument with both 'True' and the file server URL ' None '.
How were you saving the model with pytorch?
Hi @<1523701070390366208:profile|CostlyOstrich36> , I tried this, but It doesn't work, should it be fileserver url?
I run code from pod created by agent and model has been uploaded. But when task was started by agent command it doesn't upload) Magic