Is There A Way To Save The Models Completely On The Clearml Server? It Seems That Clearml Server Does Not Store The Models Or Artifacts Itself, But They Are Stored Somewhere Else (E.G., Aws S3-Bucket) Or On My Local Machine And Clearml Server Is Only Sto

Answered

Is there a way to save the models completely on the ClearML server?

It seems that ClearML Server does not store the models or artifacts itself, but they are stored somewhere else (e.g., AWS S3-bucket) or on my local machine and ClearML Server is only storing configuration parameters and previews (e.g., when the artifact is a pandas dataframe). Is that right?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

Votes Newest

Answers 45

@<1523701083040387072:profile|UnevenDolphin73> : How do you figure? In the past, my colleagues and I just shared the .zip file via email / MS Teams and it worked. So I don't think so.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

Unbelievable! That worked.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

By the way, output_uri is also documented as part of the Task.init() docstring ( None )

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1523701087100473344:profile|SuccessfulKoala55> : I referenced this conversation in the issue None

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

But, I guess @<1523701070390366208:profile|CostlyOstrich36> wrote that in a different chat, right?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

@<1523704157695905792:profile|VivaciousBadger56> It seems like whatever you pickled in the zip file relies on some additional files that are not pickled.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

@<1523701070390366208:profile|CostlyOstrich36>

My training outputs a model as a zip file. The way I save and load the zip file to make up my model is custom made (no library is directly used), because we invented the entire modelling ourselves. What I did so far:

output_model = OutputModel(task=..., config_dict={...}, name=f"...")
output_model.update_weights("C:\io__path\...", is_package=True)

and I am trying to load the model in a different Python process with

mymodel = task.models['output'][0]
mymodel = mymodel.get_local_copy(extract_archive=True, raise_on_error=True)

and I get in the clearml cache a . training.pt file, which seems to be some kind of archive. Inside I have two files named data.pkl and version and a folder with the two files named 86922176 and 86934640 .

I am not sure how to proceed after trying to use pickle, zip and joblib. I am kind of at a loss. I suspect, my original zip file might be somehow inside, but I am not sure.

Sure, we could simply use the generic artifacts sdk, but I would like to use the available terminological methods and functions.

How should I proceed?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

@<1523701087100473344:profile|SuccessfulKoala55> I think I might have made a mistake earlier - but not in the code I posted before. Now, I have the following situation:

In my training Python process on my notebook I train the custom made model and put it on my harddrive as a zip file. Then I run the code

output_model = OutputModel(task=task, config_dict={...}, name=f"...")
output_model.update_weights(weights_filename=r"C:\path\to\mymodel.zip", is_package=True)

I delete the "C:\path\to\mymodel.zip", because it would not be available on my colleagues' computers.
In a second process, the model-inference process, I run

mymodel = task.models['output'][-1]
mymodel = mymodel.get_local_copy(extract_archive=True, raise_on_error=True)

and get the error

ValueError: Could not retrieve a local copy of model weights 8ad4db1561474c43b0747f7e69d241a6, failed downloading

I do not have an aws S3 instance or something like that. This is why I would like to store my mymodel.zip file directly on the ClearML Hosted Service. The model is around 2MB large.

How should I proceed?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

@<1523701083040387072:profile|UnevenDolphin73> : If I do, what should I configure how?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

Hi @<1523704157695905792:profile|VivaciousBadger56> , you can configure Task.init(..., output_uri=True) and this will save the models to the clearml file server

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

@<1523701083040387072:profile|UnevenDolphin73> : Thanks, but it does not mention the File Storage of "ClearML Hosted Server".

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

@<1523701083040387072:profile|UnevenDolphin73> : I see. I did not make the connection that output_uri=True is what I was missing. I thought this was the default. But the default is actually "None", which is different than "True".

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

I can only say I’ve found ClearML to be very helpful, even given the documentation issue.
I think they’ve been working on upgrading it for a while, hopefully something new comes out soon.
Maybe @<1523701205467926528:profile|AgitatedDove14> has further info 🙂

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

From the one you sent - None

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Well you could start by setting the output_uri to True in Task.init .

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Either? 🙂

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

We'll try to add referenced to that in other places as well 👍

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1523701083040387072:profile|UnevenDolphin73> : I do not get this impression, because during update_weights I get the message

2023-02-21 13:54:49,185 - clearml.model - INFO - No output storage destination defined, registering local model C:\Users..._Demodaten_FF_2023-02-21_13-53-51.624362.model

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

Yes, you're correct, I misread the exception.
Maybe it hasn't completed uploading? At least for Datasets one needs to explicitly wait IIRC

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Hi all, sorry for not being so responsive today 🙏

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

We're certainly working hard on improving the documentation (and I do apologize for the frustrating experience)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

The documentation is messy, I’ve complained about it the in the past too 🙈

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

It is documented at None ... super deep in the code. If you don't know that output_uri in TASK's (!) init is relevant, you would never know...

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

Heh, good @<1523704157695905792:profile|VivaciousBadger56> 😁
I was just repeating what @<1523701070390366208:profile|CostlyOstrich36> suggested, credits to him

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Do you mean "exactly" as in "you finally got it" or in the sense of "yes, that was easy to miss"?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

FWIW, we prefer to set it in the agent’s configuration file, then it’s all automatic

  				
Posted 
	one year ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

@<1523701083040387072:profile|UnevenDolphin73>

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

I have already been trying to contribute (have three pull requests), but honestly I feel it is a bit weird, that I need to update a documentation about something I do not understand, while I actually try to evaluate if ClearML is the right tool for our company...

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

@<1523701087100473344:profile|SuccessfulKoala55> Also, I think that - in this case, but also in other cases - the issue is not just the documentation, but also the design of the SDK.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

🙂

  				
Posted 
	one year ago

					More
				  		
  Report
		
					VivaciousBadger56
				
					0
					 × 1

Show more results

Write your answer

45K Views

45 Answers

one year ago