Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I'M New To Clearml - Trying Few Things - When Using The Offline Mode How Do I Set The Offline Dir To S3? I Would Like To Send Everything From Sagemaker To Some S3 Bucket And Later Import Results To The Server - Is That The Right Way To Go?

I'm new to clearml - trying few things - when using the offline mode how do I set the offline dir to s3? I would like to send everything from sagemaker to some s3 bucket and later import results to the server - is that the right way to go?

  
  
Posted 2 years ago
Votes Newest

Answers 25


makes sense.. I currently aws s3 sync every n iterations and then I saw that there is an option to load a dir rather than a zip

  
  
Posted 2 years ago

I think for the time being it's not possible to upload automatically to S3. Not sure it's a problem to add support for that but I don't think it's supported ATM (Will double check)

  
  
Posted 2 years ago

It's automatically set in the user's .clearml - I think that /opt/ml is persistent (this is where you are supposed to save checkpoints in sagemaker)

  
  
Posted 2 years ago

BTW, just talked to the devs, what happens is that your metrics \ logs are saved locally, then once a task is closed, it's zipped. If you are affraid the instance might be taken from you, first we are planning to release a solution for these situations 🙂 and second your code needs to be aware of the risk and to be able to "resume" training from a specific model snapshot \ iteration.

  
  
Posted 2 years ago

So I think it's necessary to code defensively and once training is done, upload to a remote location (S3 in your case). If disc is persistent this should be a problem as the logs will be saved. Makes sense?

  
  
Posted 2 years ago

thanks for the quick response! and also - your library/product is really cool and impressive

  
  
Posted 2 years ago

Cool and impressive are 2 adjective we like to hear 😄

  
  
Posted 2 years ago

🙏 ❤

  
  
Posted 2 years ago

I'll try it

  
  
Posted 2 years ago

Cool!

  
  
Posted 2 years ago

but what happens if the script is terminated? maybe a spot termination, ctrl+c, this means I loose track of the training?

  
  
Posted 2 years ago

If spot is taken from you then yes. It will be. (unless there's some drive persistence)

  
  
Posted 2 years ago

Task.get_offline_mode_folder()

  
  
Posted 2 years ago

Ok - I will look into it

  
  
Posted 2 years ago

get_offline_mode_folder

  
  
Posted 2 years ago

You should look at

  
  
Posted 2 years ago

If you want you can just upload them manually to s3 as the last "line" of the script, or write a pipeline step that does that. Just remember you'll have to import it somehow later on

  
  
Posted 2 years ago

maybe I missed it in the documentation - but I could use also something like set_offline_dir() (to make sure it's pointing opt/ml or something) and then get_offline_file() and upload it myself

  
  
Posted 2 years ago

Sure - the problem is that many of our trainings in sagemaker are not exposed to the company's server

  
  
Posted 2 years ago

We can use VPC (which we use, but then the entire bringup of the training would be different)

  
  
Posted 2 years ago

this should explain how to do it. You get the offline session path once you init the task

  
  
Posted 2 years ago

Can you elaborate on the use-case a bit more? Why not report directly to the server?

  
  
Posted 2 years ago

So all training machines will be exposed to the server?

  
  
Posted 2 years ago

That would simplify things 🙂

  
  
Posted 2 years ago
599 Views
25 Answers
2 years ago
one year ago
Tags