Hey Guys Trying To Save A Model Via The Outputmodel.Update_Weights Function I Get The Following Error:

Answered

hey guys
trying to save a model via the OutputModel.update_weights function
I get the following error:

2023-03-23 11:43:23,298 - clearml.storage - ERROR - Failed uploading: cannot schedule new futures after shutdown
Failed uploading: cannot schedule new futures after interpreter shutdown
Failed uploading: cannot schedule new futures after interpreter shutdown
Exception encountered while uploading Upload failed

the task runs inside docker, so im scared the docker is shutdown before the upload finished. is that the case?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

Votes Newest

Answers 31

Hmm, so what is the difference ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

cannot schedule new futures after interpreter shutdown

This implies the process is shutting down.
Where are you uploading the model? What is the clearml version you are using ? can you check with the latest version (1.10) ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

no, i just commented it and it worked fine

Yeah, we should add a comment saying "optional" because it looks as if you need to have it there if you are using Azure.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

(im running it on docker)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

Do you have to have a value there ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1546303269423288320:profile|MinuteStork43>

Failed uploading: cannot schedule new futures after interpreter shutdown
Failed uploading: cannot schedule new futures after interpreter shutdown

This is odd where / when exactly are you trying to upload it?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

(still doesn't work)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

the base image is python:3.9-slim

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

ok martin, so what i am having troubles with now is understanding how to save the model in our azure blob storage, what i did was to specify:

upload_uri = f'

'
output_model.update_weights(register_uri=model_path, upload_uri=upload_uri, iteration=0)

but it doesn't seem to save the pkl file (which is the model_path) to the storage

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

@<1523701205467926528:profile|AgitatedDove14>

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

i updated to 1.10
i am uploading the model inside the main() function, using this code:

model_path = model_name + '.pkl'
with open(model_path, "wb") as f:
    pickle.dump(prophet_model, f)

output_model.update_weights(weights_filename=model_path, iteration=0)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

so i think debian (and python 3.9)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

of that makes sense, basically here is what you should do:

Task.init(... output_uri='

')
output_model.update_weights(register_uri=model_path)

It will automatically create a unique target folder / file under None to store your model
(btw: passing the register_uri basically sais: "I already uploaded the model there, just store the link" - i.e. does Not upload the model)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I'm trying to figure if this is reproducible...

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ignore it, I didn't try and read everything you said so far, I'll try again tomorrow and update this comment
oh, so then we're back to the old problem, when i am using
weights_filename, and it gives me the error
Failed uploading: cannot schedule new futures after interpreter shutdown

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

thanks for the help 🙂

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

ok so i accidentally (probably with luck) noticed the max_connection: 2 in the azure.storage config.

NICE!!!! 🎊
But wait where is that set?
None
Should we change the default or add a comment ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Could it be in a python at_exit event ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

task.mark_completed()

You have that at the bottom of the script, never call it on yourself, it will kill the actual process.
So what is going on you are marking your own process for termination, then it terminates itself leaving the interpreter and this is the reason for the errors you are seeing

The idea of mark_* is to mark an external Task, forcefully.
By just completing your process with exit code (0) (i.e. no error) the Task will be marked as completed anyhow, no need to call any function

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

hey, matrin
this script actuall does work

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

Hmm whats the OS and python version?
Is this simple example working for you?
None

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

@<1523701205467926528:profile|AgitatedDove14>
ok so now i upload with the following line:

op_model.update_weights(weights_filename=model_path, upload_uri=upload_uri) #, upload_uri=upload_uri, iteration=0)

and while doing it locally, it seems to upload
when i let it run remotely i get yhe original Failed uploading error.

altough, one time when i ran remote it did uploaded it. and then at other times it didn't. weird behaivor

can you help?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

no, i just commented it and it worked fine

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

@<1523701205467926528:profile|AgitatedDove14> hey martin, i deleted the task.mark_completed() line
but still i get the same error,
could it possibly be something else?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

hey martin thanks for the reply.
im doing the calling at the main function

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

i'll send you the file in private

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

that's the one, I'll add a comment (I didn't check the number of connections it opens, so idk the right number)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

im trying to figure out
i'll play with it a bit and let you know

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

ok so i accidentally (probably with luck) noticed the max_connection: 2 in the azure.storage config.
canceled that, and so now everything works

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

btw: what's the OS and python version?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Show more results

Write your answer

48K Views

31 Answers

one year ago