Hi, I Have A Question About The Model Registry. Here'S My Situation: I'M Using K8S_Example And Struggling With Uploading A Model. Should Models Be Uploaded To The Fileserver, Or Should I Create Another S3 Bucket As Mentioned In The Documentation?

Answered

Hi, I have a question about the Model Registry. Here's my situation: I'm using k8s_example and struggling with uploading a model. Should models be uploaded to the Fileserver, or should I create another S3 bucket as mentioned in the documentation?
sdk.development.default_output_uri = ...
Currently, models are being saved locally in the pod and are deleted when the pod is terminated, and I can't find the reason why.

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					DisturbedLizard6
				
					0
					 × 1

Votes Newest

Answers 15

@<1523701070390366208:profile|CostlyOstrich36> Yes, I read this at documentation and tried it. But when I use "True" It changes path from " None ...." to " None ..." It's very strange behavior

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					DisturbedLizard6
				
					0
					 × 1

Hi @<1523701070390366208:profile|CostlyOstrich36> , I tried this, but It doesn't work, should it be fileserver url?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					DisturbedLizard6
				
					0
					 × 1

Ok, guys, I done it, by manually uploading model.
task = Task.init(project_name='test', task_name='PyTorch MNIST train filserver dataset')
output_model = OutputModel(task=task, framework="PyTorch")
output_model.set_upload_destination(uri=" None ")
tmp_dir = os.path.join(gettempdir(), " mnist_cnn.pt ")
torch.save(model.state_dict(), tmp_dir)
output_model.update_weights(weights_filename=tmp_dir)

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					DisturbedLizard6
				
					0
					 × 1

Hi @<1742355077231808512:profile|DisturbedLizard6> , you can use the output_uri parameter of Task.init() to specify where to upload models.
None

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Hi @<1742355077231808512:profile|DisturbedLizard6> , not sure I get that, did you use torch.save (like in here ) or some other command to save the models? When running with the clearml-agent. you have a print of all the configurations at the beginning of the log, can you verify your values are as you configure it?

Additionally, which version of clearml , clearml-agent and torch are you using?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					RoundElephant20
				
					0

How were you saving the model with pytorch?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

I'm currently unsure about the correct approach. Would you kindly review my attempts and point out where I might have made a mistake? Here's what I've tried:

I've added the default url in agent helm chart

    clearml:
      ...
      clearmlConfig: |-
       sdk {
         development {
           default_output_uri: "

"
          }
       }

I've added url in agent section:

    agentk8sglue:
      ...
      fileServerUrlReference:

In the Python file, when using Task.init, I've tried the 'output_uri' key argument with both 'True' and the file server URL ' None '.

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					DisturbedLizard6
				
					0
					 × 1

'True' should point to the files server

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Are you sure the files server is correctly configured on the pods ?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

So when you do torch.save() it doesn't save the model?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

I didn't save it in any way. I relied on the auto-save from Clearml

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					DisturbedLizard6
				
					0
					 × 1

Pod easily can download dataset, upload to fileserver logs, but can't upload model 😀

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					DisturbedLizard6
				
					0
					 × 1

Ok, maybe someone knows: how does a pod created by a K8s agent know the model registry URL? When I added the output_uri parameter in the Task, like output_uri=" None ", it doesn't show anything now. Previously, without this parameter, it showed a path like " None ...." in WebUI->Experiments->Artifacts

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					DisturbedLizard6
				
					0
					 × 1

Ok, I found out that using scikit-learn the model is uploading, but pytorch doesn't.

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					DisturbedLizard6
				
					0
					 × 1

I run code from pod created by agent and model has been uploaded. But when task was started by agent command it doesn't upload) Magic

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					DisturbedLizard6
				
					0
					 × 1

Write your answer

375 Views

15 Answers

3 months ago