Hey Guys Trying To Save A Model Via The Outputmodel.Update_Weights Function I Get The Following Error:

Answered

hey guys
trying to save a model via the OutputModel.update_weights function
I get the following error:

2023-03-23 11:43:23,298 - clearml.storage - ERROR - Failed uploading: cannot schedule new futures after shutdown
Failed uploading: cannot schedule new futures after interpreter shutdown
Failed uploading: cannot schedule new futures after interpreter shutdown
Exception encountered while uploading Upload failed

the task runs inside docker, so im scared the docker is shutdown before the upload finished. is that the case?

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

Votes Newest

Answers 31

Hmm whats the OS and python version?
Is this simple example working for you?
None

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

btw: what's the OS and python version?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

that's the one, I'll add a comment (I didn't check the number of connections it opens, so idk the right number)

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

Could it be in a python at_exit event ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ok martin, so what i am having troubles with now is understanding how to save the model in our azure blob storage, what i did was to specify:

upload_uri = f'

'
output_model.update_weights(register_uri=model_path, upload_uri=upload_uri, iteration=0)

but it doesn't seem to save the pkl file (which is the model_path) to the storage

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

hey martin thanks for the reply.
im doing the calling at the main function

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

ignore it, I didn't try and read everything you said so far, I'll try again tomorrow and update this comment
oh, so then we're back to the old problem, when i am using
weights_filename, and it gives me the error
Failed uploading: cannot schedule new futures after interpreter shutdown

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

of that makes sense, basically here is what you should do:

Task.init(... output_uri='

')
output_model.update_weights(register_uri=model_path)

It will automatically create a unique target folder / file under None to store your model
(btw: passing the register_uri basically sais: "I already uploaded the model there, just store the link" - i.e. does Not upload the model)

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

so i think debian (and python 3.9)

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

Hi MinuteStork43

Failed uploading: cannot schedule new futures after interpreter shutdown
Failed uploading: cannot schedule new futures after interpreter shutdown

This is odd where / when exactly are you trying to upload it?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

martin*

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

i'll send you the file in private

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

cannot schedule new futures after interpreter shutdown

This implies the process is shutting down.
Where are you uploading the model? What is the clearml version you are using ? can you check with the latest version (1.10) ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

(im running it on docker)

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

the base image is python:3.9-slim

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

AgitatedDove14
ok so now i upload with the following line:

op_model.update_weights(weights_filename=model_path, upload_uri=upload_uri) #, upload_uri=upload_uri, iteration=0)

and while doing it locally, it seems to upload
when i let it run remotely i get yhe original Failed uploading error.

altough, one time when i ran remote it did uploaded it. and then at other times it didn't. weird behaivor

can you help?

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

ok so i accidentally (probably with luck) noticed the max_connection: 2 in the azure.storage config.
canceled that, and so now everything works

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

thanks for the help 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

no, i just commented it and it worked fine

Yeah, we should add a comment saying "optional" because it looks as if you need to have it there if you are using Azure.

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hmm, so what is the difference ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Do you have to have a value there ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

no, i just commented it and it worked fine

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

i updated to 1.10
i am uploading the model inside the main() function, using this code:

model_path = model_name + '.pkl'
with open(model_path, "wb") as f:
    pickle.dump(prophet_model, f)

output_model.update_weights(weights_filename=model_path, iteration=0)

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

AgitatedDove14

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

hey, matrin
this script actuall does work

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

im trying to figure out
i'll play with it a bit and let you know

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

task.mark_completed()

You have that at the bottom of the script, never call it on yourself, it will kill the actual process.
So what is going on you are marking your own process for termination, then it terminates itself leaving the interpreter and this is the reason for the errors you are seeing

The idea of mark_* is to mark an external Task, forcefully.
By just completing your process with exit code (0) (i.e. no error) the Task will be marked as completed anyhow, no need to call any function

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

(still doesn't work)

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

I'm trying to figure if this is reproducible...

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 hey martin, i deleted the task.mark_completed() line
but still i get the same error,
could it possibly be something else?

  				
Posted 
	2 years ago

					More  		
  Report
		
					NonchalantOx99
				
					0
					 × 1

Show more results

Write your answer

98K Views

31 Answers

2 years ago