The behaviour I'd like to achieve is that any artefact is automatically saved to an S3 bucket, possibly without having the Data Scientist having to configure much on their side.
Right now, we are storing artefacts in the fileserver, and we have to make sure that we use output_uri=True in the Task.init call to have artefacts uploaded to ClearML fileserver.
What's the ideal setup to keep the boilerplate for DS code minimal?
Hi SarcasticSquirrel56 ,
You can use output_uri=<S3_BUCKET>
in Task.init()
to upload artifacts to s3. Is that what you're looking for?
CostlyOstrich36 so I don't have to write the clearml.conf?
I would like to setup things so that a data scientist working on a project doesn't have to know about buckets and this sort of things... Ideally the server and the agents are configured with a default bucket...
SarcasticSquirrel56 , you're right. I think you can use the following setting in ~/clearml.conf
: sdk.development.default_output_uri: <S3_BUCKET>
. Tell me if that works
So I set this to sdk.development.default_output_uri: <url to fileserver>
in the K8s Agent Glue pod, but DS in order for models to be uploaded,
you still have to set: output_uri=True
in the Task.init()
command...
otherwise artifacts are only stored on my laptop, and in the "models" I see a uri like : file:///Users/luca/etc.etc./
So I am not really sure what this does
but DS in order for models to be uploaded,
you still have to set:
output_uri=True
in the
No, if you set the default_output_uri, there is no need to pass output_uri=True
in the Task.init()
🙂
It is basically setting it for you, make sense ?
Maybe I did something wrong...
the clearml.conf in the agentglue pod looks like:
sdk { development { default_output_uri: "fileserver_url" } } agent { package_manager: { extra_index_url: [ "extra_index_url" ] } }
but when I removed output_uri from Task.init, the pickled model has path file:///Users/luca/path/to/pickle.file
do I need something else in the clearml.conf?
but when I removed output_uri from Task.init, the pickled model has path
When you run the job on the k8s pod?
when I run it on my laptop...
what I am trying to achieve is not having to worry about this setting, and have all the artifacts and models uploaded to the file server automatically
when I run it on my laptop...
Then yes, you need to set the default_output_uri
on Your laptop's clearml.conf (just like you set it on the k8s glue)
Make sense ?