Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Does Artifact Track Per File Base? What If Only Some File Is Updated, Does It Knows Only Uploading The New Files? Also, Wonder What Is The Best Way To Setup Storage For Teams To Share? (Not Prefer Using Cloud As Network Cost Can Be Significant Since We Do

Does artifact track per file base? What if only some file is updated, does it knows only uploading the new files? Also, wonder what is the best way to setup storage for teams to share? (not prefer using cloud as network cost can be significant since we don't use cloud VM for model training. 🙏

  
  
Posted 4 years ago
Votes Newest

Answers 8


EnviousStarfish54

oh, this is a bit different from my expectation. I thought I can use artifact for dataset or model version control.

You totally can use artifacts as a way to version data (actually we will have it built in in the next versions)

Getting an artifact programmatically:
Task.get_task(task_id='aabb'). artifacts['artifactname'].get()

Models are logged automatically. No need to log manually

  
  
Posted 4 years ago

Sorry for late reply, you mention there will be built-in way to version data. May I asked is there a release date for it?

  
  
Posted 4 years ago

oh, this is a bit different from my expectation. I thought I can use artifact for dataset or model version control.

  
  
Posted 4 years ago

StorageManager is what you need, if you want to download/upload files to any server (this is a utility class the takes care of the DL/uL + adds caching) storage helper is used internally

  
  
Posted 4 years ago

we will have a dedicate vm to hold trains related docker, do I need to setup some file server? (i saw earlier thread mention minio)

  
  
Posted 4 years ago

Also I am unclear what is the difference of storageManager and StorageHelper, is there an example that integrate that with model training.

I go through the doc and seems it doesn't mention downloading from artifact (programatically)?

  
  
Posted 4 years ago

EnviousStarfish54 regrading file server, you have one built into the trains-server, and this will be the default location to store all artifacts. You can also use external solutions like S3 GS Azure etc.
Regarding the models, any model store / load is automatically logged as long as you are using one of the supported frameworks (TF Keras PyTorch scikit learn)
If you want your model to be automatically uploaded, just add outpu_uri:
task=Task.init('examples', 'model', output_uri=' http://trains-server:8081/ ')

  
  
Posted 4 years ago

Hi EnviousStarfish54
Artifacts are stored per experiment, that means that storage wise every experiment uploading an artifact (even if it is the same file content as previous execution) will create a new file on the central storage (default being the trains-server)
As for the preferred way to share data / artifacts. Where do you have your trains server ? Is it local ? Cloud? Where do you access it from home? VPN?

  
  
Posted 4 years ago