Hi Folks, I Have A Question Related To The Storage Of Artifacts, As It Is Not Entirely Clear To Me Where To Configure It. If I Read The Documentation

Answered

Hi folks, I have a question related to the storage of artifacts, as it is not entirely clear to me where to configure it.
If I read the documentation https://clear.ml/docs/latest/docs/integrations/storage I am not sure where I should put the configuration: in my local machine that I use to submit tasks to ClearML, or in the fileserver? (or in all the web, api, fileserver and agents)?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

Votes Newest

Answers 13

thanks, yes it makes sense!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

when I run it on my laptop...

Then yes, you need to set the default_output_uri on Your laptop's clearml.conf (just like you set it on the k8s glue)
Make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

when I run it on my laptop...
what I am trying to achieve is not having to worry about this setting, and have all the artifacts and models uploaded to the file server automatically

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

but when I removed output_uri from Task.init, the pickled model has path

file:///Users/luca/path/to/pickle.file

When you run the job on the k8s pod?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

do I need something else in the clearml.conf?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

Maybe I did something wrong...
the clearml.conf in the agentglue pod looks like:

sdk { development { default_output_uri: "fileserver_url" } } agent { package_manager: { extra_index_url: [ "extra_index_url" ] } }
but when I removed output_uri from Task.init, the pickled model has path file:///Users/luca/path/to/pickle.file

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

but DS in order for models to be uploaded,
you still have to set:

output_uri=True

in the

No, if you set the default_output_uri, there is no need to pass output_uri=True in the Task.init() 🙂
It is basically setting it for you, make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

So I set this to sdk.development.default_output_uri: <url to fileserver>
in the K8s Agent Glue pod, but DS in order for models to be uploaded,
you still have to set: output_uri=True in the Task.init() command...
otherwise artifacts are only stored on my laptop, and in the "models" I see a uri like : file:///Users/luca/etc.etc./
So I am not really sure what this does

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

SarcasticSquirrel56 , you're right. I think you can use the following setting in ~/clearml.conf : sdk.development.default_output_uri: <S3_BUCKET> . Tell me if that works

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

CostlyOstrich36 so I don't have to write the clearml.conf?

I would like to setup things so that a data scientist working on a project doesn't have to know about buckets and this sort of things... Ideally the server and the agents are configured with a default bucket...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

Hi SarcasticSquirrel56 ,

You can use output_uri=<S3_BUCKET> in Task.init() to upload artifacts to s3. Is that what you're looking for?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Pinging this question

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

The behaviour I'd like to achieve is that any artefact is automatically saved to an S3 bucket, possibly without having the Data Scientist having to configure much on their side.

Right now, we are storing artefacts in the fileserver, and we have to make sure that we use output_uri=True in the Task.init call to have artefacts uploaded to ClearML fileserver.

What's the ideal setup to keep the boilerplate for DS code minimal?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

Write your answer

2K Views

13 Answers

3 years ago

2 years ago