Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Folks, I Have A Question Related To The Storage Of Artifacts, As It Is Not Entirely Clear To Me Where To Configure It. If I Read The Documentation

Hi folks, I have a question related to the storage of artifacts, as it is not entirely clear to me where to configure it.
If I read the documentation https://clear.ml/docs/latest/docs/integrations/storage I am not sure where I should put the configuration: in my local machine that I use to submit tasks to ClearML, or in the fileserver? (or in all the web, api, fileserver and agents)?

  
  
Posted 2 years ago
Votes Newest

Answers 13


SarcasticSquirrel56 , you're right. I think you can use the following setting in ~/clearml.conf : sdk.development.default_output_uri: <S3_BUCKET> . Tell me if that works

  
  
Posted 2 years ago

but when I removed output_uri from Task.init, the pickled model has path

file:///Users/luca/path/to/pickle.file

When you run the job on the k8s pod?

  
  
Posted 2 years ago

CostlyOstrich36 so I don't have to write the clearml.conf?

I would like to setup things so that a data scientist working on a project doesn't have to know about buckets and this sort of things... Ideally the server and the agents are configured with a default bucket...

  
  
Posted 2 years ago

do I need something else in the clearml.conf?

  
  
Posted 2 years ago

Maybe I did something wrong...
the clearml.conf in the agentglue pod looks like:

sdk { development { default_output_uri: "fileserver_url" } } agent { package_manager: { extra_index_url: [ "extra_index_url" ] } }
but when I removed output_uri from Task.init, the pickled model has path file:///Users/luca/path/to/pickle.file

  
  
Posted 2 years ago

The behaviour I'd like to achieve is that any artefact is automatically saved to an S3 bucket, possibly without having the Data Scientist having to configure much on their side.

Right now, we are storing artefacts in the fileserver, and we have to make sure that we use output_uri=True in the Task.init call to have artefacts uploaded to ClearML fileserver.

What's the ideal setup to keep the boilerplate for DS code minimal?

  
  
Posted 2 years ago

when I run it on my laptop...

Then yes, you need to set the default_output_uri on Your laptop's clearml.conf (just like you set it on the k8s glue)
Make sense ?

  
  
Posted 2 years ago

when I run it on my laptop...
what I am trying to achieve is not having to worry about this setting, and have all the artifacts and models uploaded to the file server automatically

  
  
Posted 2 years ago

thanks, yes it makes sense!

  
  
Posted 2 years ago

Pinging this question

  
  
Posted 2 years ago

Hi SarcasticSquirrel56 ,

You can use output_uri=<S3_BUCKET> in Task.init() to upload artifacts to s3. Is that what you're looking for?

  
  
Posted 2 years ago

but DS in order for models to be uploaded,
you still have to set:

output_uri=True

in the

No, if you set the default_output_uri, there is no need to pass output_uri=True in the Task.init() 🙂
It is basically setting it for you, make sense ?

  
  
Posted 2 years ago

So I set this to sdk.development.default_output_uri: <url to fileserver>
in the K8s Agent Glue pod, but DS in order for models to be uploaded,
you still have to set: output_uri=True in the Task.init() command...
otherwise artifacts are only stored on my laptop, and in the "models" I see a uri like : file:///Users/luca/etc.etc./
So I am not really sure what this does

  
  
Posted 2 years ago